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continuous change with respect to parameter variation, the number of observable periodic windows 
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topological change is very common with small parameter perturbation. However, this seemingly 
inevitable topological variation is never catastrophic (the dynamic type is preserved) if the dimension 
of the system is high enough. 

PACS numbers: 05.45.-a, 89.75.-k, 05.45.Tp, 89.70.+C, 89.20.Ff 

Keywords: Partial hyperbolicity, dynamical systems, structural stability, stability conjecture, Lyapunov 
exponents, complex systems 



U 



o 

00 

o 

O 



> 
X 



I. INTRODUCTION 

Much of the work in the fields of dynamical systems 
and differential equations have, for the last hundred 
years, entailed the classification and understanding of the 
qualitative features of the space of differcntiable map- 
pings. A primary focus is the classification of topological 
differences between different systems (e.g. structural sta- 
bility theory). Of course one of the primary difficulties 
is choosing a notion of behavior that is not so strict that 
it differentiates on too trivial a level, yet is strict enough 
that it has some meaning (Palis-Smale used topological 
equivalence, Pugh-Shub use crgodicity). The previous 
stability conjectures are with respect to any C (r > 
varies from conjecture to conjecture) perturbation allow- 
ing for variation of the mapping, both of the functional 
form (with respect to the Whitney C topology) and of 
parameter variation. We will concern ourselves with the 
latter issue. Unlike much work involving stability con- 
jectures, our work is numerical, and it focuses on observ- 
able asymptotic behaviors in high-dimensional systems. 
Our chief claim is that generally, for high-dimensional 
dynamical systems in our construction, there exist large 
portions of parameter space such that topological varia- 
tion inevitably accompanies parameter variation, yet the 
topological variation happens in a "smooth," non-erratic 
manner. Let us state our results without rigor, noting 
that we will save more rigorous statements for section 

(Inl. 



Statement of Results 1 (Informal) Given our par- 
ticular impositions (sections jllA 4}) and 1^11 A upon 
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the space of C discrete-time maps from compact sets to 
themselves, and an invariant measure (used for calculat- 
ing Lyapunov exponents), in the limit of high dimension, 
there exists a subset of parameter space such that strict 
hyperbolicity is violated on a nearly dense (and hence 
unavoidable), yet zero-measure (with respect to Lebesgue 
measure), subset of parameter space. 

A more refined version of this statement will contain all of 
our results. For mathematicians, we note that although 
the stability conjecture of Palis and Smale is quite 
true (as proved by Robbin 2], Robinson Q, and Mane 
0), we show that in high dimensions, this structural 
stability may occur over such small sets in the parameter 
space that it may never be observed in chaotic regimes 
of parameter space. Nevertheless, this lack of observ- 
able structural stability has very mild consequences for 
applied scientists. 



A. Outline 

As this paper is attempting to reach a diverse reader- 
ship, we will briefly outline the work for ease of reading. 
Of the remaining introduction sections, section (jf B|l can 
be skipped by readers familiar with the stability conjec- 
ture of Smale and Palis and the stable ergodicity of Pugh 
and Shub. 

Following the introduction we will address various pre- 
liminary topics pertaining to this report. Beginning in 
section A I} , we present the mathematical justifica- 
tion for the study of time-delay maps being sufficient 
for a general study of d > 1 dimensional dynamical sys- 
tems. This section is followed with a discussion of neural 
networks, beginning with their definition in the abstract 
(section (III A2(l ). Following the definition of neural net- 
works, we explain the mappings neural networks are able 
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to approximate (section (jll A3|l '). In section Ijll A 4p we 
give our specific construction of neural networks. Those 
uninterested in the mathematical justifications for our 
models and only interested in our specific formulation 
should skip sections A fl) thru Ijll A 3|1 and concentrate 
on section A 4|l . The discussion of our set of mappings 
is followed by relevant definitions from hyperbolicity and 
ergodic theory (section (|IIB|l 'l. It is here where we define 
the Lyapunov spectrum, hyperbolic maps, and discuss 
relevant stability conjectures. Section (|II C|l provides jus- 
tification for our use of Lyapunov exponent calculations 
upon our space of mappings (the neural networks). Read- 
ers familiar with topics in hyperbolicity and ergodic the- 
ory can skip this section and refer to it as is needed for 
an understanding of the results. Lastly, in section Ijll D|l . 
we make a series of definitions we will need for our nu- 
merical arguments. Without an understanding of these 
definitions, it is difficult to understand both our conjec- 
tures of our arguments. 

Section (|III|I discusses the conjectures we wish to in- 
vestigate formally. For those interested in just the results 
of this report, reading sections Ijll U() . I|III() and HVII() will 
suffice. The next section, section (|IVp . discusses the er- 
rors present in our chief numerical tool, the Lyapunov 
spectrum. This section is necessary for a fine and care- 
ful understanding of this report, but this section is easily 
skipped upon first reading. We then begin our prelim- 
inary numerical arguments. Section 10, addresses the 
three major properties we need to argue for our conjec- 
tures. For an understanding of our arguments and why 
our conclusions make sense, reading this section is neces- 
sary. The main arguments regarding our conjectures fol- 
low in section ljVI|l . It is in this section that we make the 
case for the claims of section (jIII|) . The summary section 
(section (jVII|) ') begins with a summary of our numerical 
arguments and how they apply to our conjectures. We 
then interpret our results in light of various stability con- 
jectures and other results from the dynamics community. 



B. Background 

To present a full background with respect to the top- 
ics and motivations for our study would be out of place 
in this report. We will instead discuss the roots of our 
problems and a few relevant highlights, leaving the reader 
with references to the survey papers of Burns et. al 
Pugh and Shub 6], Palis |7|, and Nitecki g for a more 
thorough introduction. 

The origin of our work, as with all of dynamical sys- 
tems, lies with Poincare who split the study of dynam- 
ics in mathematics into two categories, conservative and 
dissipative systems; we will be concerned with the latter. 
We will refrain from beginning with Poincare and instead 
begin in the 1960's with the pursuit of the "lost dream." 

The "dream" amounted to the conjecture that struc- 
turally stable dynamical systems would be dense among 
all dynamical systems. For mathematicians, the dream 



was motivated primarily by a desire to classify dynamical 
systems via their topological behavior. For physicists and 
other scientists however, this dream was two-fold. First, 
since dynamical systems (via differential equations and 
discrete-time maps) are usually used to model physical 
phenomena, a geometrical understanding of how these 
systems behave in general is, from an intuitive stand- 
point, very insightful. However, there is a more practi- 
cal motivation for the stability dream. Most experimen- 
tal scientists who work on highly nonlinear systems (e.g. 
plasma physics and fluid dynamics) are painfully aware 
of the existence of the dynamic stability that the math- 
ematicians where hoping to capture with the stability 
conjecture of Palis and Smale. When we write dynamic 
stability we do not mean fixed point versus chaotic dy- 
namics, rather we mean that upon normal or induced ex- 
perimental perturbations, dynamic types are highly per- 
sistent. Experimentalists have been attempting to con- 
trol and eliminate turbulence and chaos since they began 
performing experiments — it is clear from our experience 
that turbulence and chaos are highly stable with respect 
to perturbations in highly complicated dynamical sys- 
tems, the why and how of the stability and what is the 
right notion of equivalence to capture that stability is the 
question. In a practical sense, the hope lies in that, if the 
geometric characteristics that allow chaos to persist can 
be understood, it might be easier to control or even elim- 
inate those characteristics. At the very least, it would 
be useful to at least know very precisely why we can't 
control or rid our systems of turbulent behavior. At any 
rate, the dream was "lost" in the late 1960's via many 
counter examples (0), leaving room for a very rich the- 
ory. Conjectures regarding weaker forms of the dream 
for which a subset of "nice" diffeomorphisms would be 
dense were put forth, many lasted less than a day, and 
none worked. The revival of the dream in the 1990's 
involved a different notion of nice - stable ergodicity. 

Near the time of the demise of the "dream" the no- 
tion of structural stability together with Smale's notion 
of hyperbolicity was used to formulate the stability con- 
jecture (the connection between structural stability and 
hyperbolicity - now a theorem) |^. The stability conjec- 
ture says that "a system is C stable if its limit set is 
hyperbolic and, moreover, stable and unstable manifolds 
meet transversally at all points." 

To attack the stability conjecture, Smale had intro- 
duced axiom A. Dynamical systems that satisfy axiom 
A are strictly hyperbolic (definition (0) and have dense 
periodic points on the non- wandering set^G^. A further 
condition that was needed is the strong transversality 
condition - / satisfies the strong transversality condition 
when, on every x G M, the stable and unstable mani- 
folds and are transverse at x. That axiom A 
and strong transversality imply C" structural stability 
was shown by Robbin 2] for r > 2 and Robinson Q for 
r — I. The other direction of the stability conjecture was 
much more elusive, yet in 1980 this was shown by Mane 
13 for r = 1. 
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Nevertheless, due to many examples of structurally 
unstable systems being dense amongst many "common" 
types of dynamical systems, proposing some global struc- 
ture for a space of dynamical systems became much more 
unlikely. Newhouse ^3 was able to show that infinitely 
many sinks occur for a residual subset of an open set 
of diffeomorphisms near a system exhibiting a ho- 
moclinic tangency. Further, it was discovered that orbits 
can be highly sensitive to initial conditions ^ , ^3 1 E3 ' 
[T^ . Much of the sensitivity to initial conditions was 
investigated numerically by non-mathematicians. To- 
gether, the examples from both pure mathematics and 
the sciences sealed the demise of the "dream" (via topo- 
logical notions) , yet they opened the door for a wonderful 
and diverse theory. Nevertheless, despite the fact that 
structural stability does not capture all we wish it to 
capture, it is still a very useful, intuitive tool. 

Again, from a physical perspective, the question of the 
existence of dynamic stability is not open - physicists 
and engineers have been trying to suppress chaos and 
turbulence in high-dimensional systems for several hun- 
dred years. The trick in mathematics is writing down 
a relevant notion of dynamic stability and then the rel- 
evant necessary geometrical characteristics to guarantee 
dynamic stability. From the perspective of modeling na- 
ture, structural stability says that if one selects (fits) a 
model equation, small errors will be irrelevant since small 
C" perturbations will yield topologically equivalent mod- 
els. It is the topological equivalence that is too strong 
a characteristic for structural stability to apply to the 
broad range of systems we wish it to apply to. Struc- 
tural stability is difficult to use in a very practical way 
because it is very difficult to show (or disprove the exis- 
tence of) topological (C°) equivalence of a neighborhood 
of maps. Hyperbolicity can be much easier to handle 
numerically, yet it is not always common. Luckily, to 
quote Pugh and Shub , "a little hyperbolicity goes a 
long way in guaranteeing stably ergodic behavior." This 
thesis has driven the partial hyperbolicity branch of dy- 
namical systems and is our claim as well. We will define 
precisely what we mean by partial hyperbolicity and will 
discuss relevant results a la stable ergodicity and partial 
hyperbolicity. 

Our investigation will, in a practical, computational 
context, investigate the extent to which ergodic behav- 
ior and topological variation (versus parameter variation) 
behave given a "little bit" of hyperbolicity. Further, we 
will investigate one of the overall haunting questions: 
how much of the space of bounded C" (r > 0) systems 
is hyperbolic, and how many of the pathologies found by 
Newhouse and others are observable (or even existent) 
in the space of bounded C dynamical systems. Stated 
more generally, how does hyperbolicity (and thus struc- 
tural stability) "behave" in a space of bounded C" dy- 
namical systems. 



II. DEFINITIONS AND PRELIMINARIES 

In this section we will define the following items: the 
family of dynamical systems we wish to investigate; the 
function space we will use in our experiments; Lyapunov 
exponents; and finally we will list definitions specific to 
our numerical arguments. The choice of scalar neural 
networks as our maps of choice is motivated by their be- 
ing "universal approximators." 



A. Our space of mappings 

The motivation and construction of the set of map- 
pings we will use for our investigation of dynamical sys- 
tems follows via two directions, the embedding theorem 
of Takens (O, [13) and the neural network approx- 
imation theorems of Hornik, Stinchomebe, and White 
[l8l| . We will use the Takens embedding theorem to 
demonstrate how studying time-delayed maps of the form 
/ : i?'' — » i? is a natural choice for studying standard 
dynamical systems of the form F : ^ i?''. This is 
important as we will be using time-delayed scalar neural 
networks for our study. The neural network approxima- 
tion theorems show that neural networks of a particular 
form are open and dense in several very general sets of 
functions and thus can be used to approximate any func- 
tion in the allowed function spaces. 

There is overlap, in a sense, between these two con- 
structions. The embedding theory shows an equivalence 
or the approximation capabilities of scalar time-delay dy- 
namics with standard, xt+i — F{xt) {xi G R'^) dynamics. 
There is no mention of, in a practical sense, the explicit 
functions in the Takens construction. The neural net- 
work approximation results show in a precise and prac- 
tical way, what a neural network is, and what functions 
it can approximate. It says that neural networks can 
approximate the C^{R'^) mappings and their derivatives, 
but there is no mention of the time-delays we wish to use. 
Thus we need to discuss both the embedding theory and 
the neural network approximation theorems. 

Those not interested in the mathematical justification 
of our construction may skip to section (|II A 4|) where we 
define, in a concrete manner, our neural networks. 



1. Dynamical systems construction 

We wish, in this report, to investigate dynamical sys- 
tems on compact sets. Specifically, begin with a com- 
pact manifold M of dimension d and a diffcomorphism 
F G CiM) for r > 2 defined as: 

xt+i = F{xt) (1) 

with xt G M . However, for computational reasons, we 
will be investigating this space with neural networks that 
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can approximate (see section A 3|l ) dynamical systems 
/ € C"'(i?'^, R) that are time-delay maps given by: 

yt+i^ f{yt,yt-i,---,yt^{d~i)) (2) 

where yt € R- Both systems and (0) form dynamical 
systems. However, since we intend to use systems of the 
form lI2Jl to investigate the space of dynamical systems as 
given in equation |^ , we must show that a study of map- 
pings of the form |(2Jl is somehow equivalent to mappings 
of the form Q ■ We will demonstrate this by employing 
an embedding theorem of Takens to demonstrate the re- 
lationship between time-delay maps and non-time-delay 
maps in a more general and formal setting. 

We call g e C^{M,R'^) an embedding if fc > 1 and 
if the local derivative map (the Jacobian - the first or- 
der term in the Taylor expansion) is one-to-one for ev- 
ery point X £ M (i.e. g must be an immersion). The 
idea of the Takens embedding theorem is that given a d- 
dimensional dynamical system and a "measurement func- 
tion," E : M R {E is & map), where E rep- 
resents some empirical style measurement of there 
is a Takens map (which does the embedding) g for 
which X G M can be represented as a 2(i + 1 tuple 
{E{x),EoF{x),EoF'^{x), . . . ,EoF'^'^{x)) where F is an 
ordinary difference equation (time evolution operator) on 
AI. Note that the 2d+ 1 tuple is a time-delay map of x. 
We can now state the Takens embedding theorem: 

Theorem 1 (Takens' embedding theorem | l6| 

|l7j) Let AI be a compact manifold with dimension d. 
There is an open dense subset S C Dijf [M] x C*^ {AI, R) 
with the property that the Takens map 

g:M^ (3) 

given by g{x) = {E{x), EoF{x), EoF^{x), EoF^'^{x)) 
is an embedding of manifolds, when {F,E) £ S. 

Here Diff{M) is the space of diffeomorphisms from 
M to itself with the subspace topology from C'^ (M, M) . 
Thus, there is an equivalence between time-delayed Tak- 
ens maps of "measurements" and the "actual" dynamical 
system operating in time on xt G AI . This equivalence is 
that of an embedding (the Takens map), g : AI ^ R^'^'^^. 

To demonstrate how this applies to our circumstances, 
consider figure ^ in which F and E are as given above 
and the embedding g is explicitly given by: 

g{xt) = {E{xt), E{F{xt)), E{F'^{xt))) (4) 

In a colloquial, experimental sense, F just keeps track of 
the observations from the measurement function E, and, 
at each time step, shifts the newest observation into the 
2d+l tuple and sequentially shifts the scalar observation 
at time t (yt) of the 2d + 1 tuple to the t — 1 position 
of the 2d + 1 tuple. In more explicit notation, F is the 
following mapping: 

(2/1, • • ■ , y2d+i) (y2, • ■ ■ , 2/2d+i, g{F{g~'^{yi, y2d+i)))) 

(5) 



where, again, F = g o F o g~^ . The neural networks we 
will propose in the sections that follow can approximate 
F and its derivatives (to any order) to arbitrary accuracy 
(a notion we will make more precise later). 

Let us summarize what we are attempting to do: we 
wish to investigate dynamical systems given by but 
for computational reasons we wish to use dynamical sys- 
tems given by the Takens embedding theorem says 
that dynamical systems of the form can be generi- 
cally represented (via the Takens embedding map g) by 
time-delay dynamical systems of the form (jSJ. Since neu- 
ral networks will approximate dynamical systems of the 
form jSJl on a compact and metrizable set, it will suffice 
for our investigation of dynamical systems of the form 
to consider the space of neural networks mapping com- 
pact sets to compact sets as given in section (jll A 21 . 



2. Abstract neural networks 

Begin by noting that, in general, a neural network is 
a C mapping 7 : i?" R. More specifically, the set of 
feedforward networks with a single hidden layer, S(G'), 
can be written: 

N 

S(G) = {7 : i?"^ ^ R\-i{x) = AG(£^^,)} (6) 

i=l 

where x e R'^, is the d— vector of networks inputs, 
Sp^ = (1,2;-^) (where is the transpose of x), N is the 
number of hidden units (neurons). Pi, . . . , (3n G R are 
the hidden-to-output layer weights, uji,...,ujn G R"^^^ 
are the input-to-hidden layer weights, and G : R'^ ^ R 
is the hidden layer activation function (or neuron). The 
partial derivatives of the network output function, 7, are 



^ = ^/?.c..fcZ?G(i^c.O (7) 
i—i 

where Xk is the fc*'' component of the x vector, w^fc is the 
fc*'* component of LOi, and DG is the usual first derivative 
of G. The matrix of partial derivatives (the Jacobian) 
takes a particularly simple form when the x vector is a 
sequence of time delays (xt = {yt,yt-i, ■ ■ ■ ,Vt-{d^i)) for 
Xt G R'^ and yi £ R). It is for precisely this reason that 
we choose the time-delayed formulation. 



3. Neural networks as function approximations 

We will begin with a brief description of spaces of maps 
useful for our purposes and conclude with the keynote 
theorems of Hornik et al. necessary for our work. 
Hornik et al. provided the theoretical justification for 
the use of neural networks as function approximators. 
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F: M-^M 




FIG. 1: Schematic diagram of the Takens embedding theorem and how it apphes to our construction. 



The aforementioned authors provide a degree of general- 
ity that we will not need; for their results in full generality 
see [l^, Is). 

The ability of neural networks to approximate func- 
tions which are of particular interest, can be most eas- 
ily seen via a brief discussion of Sobolev function space. 



n sp a 

S'™. We will be brief, noting references Adams |23] and 
Hebey for readers wanting more depth with respect 
to Sobolev spaces. For the sake of clarity and simplifica- 
tion, let us make a few remarks which will pertain to the 
rest of this section: 

i. /i is a measure; A is the standard Lebesgue measure; 
for all practical purposes, ^ = A; 

ii. I, m and d are finite, non- negative integers; m will 
be with reference to a degree of continuity of some 
function spaces, and d will be the dimension of the 
space we are operating on; 



iii. p € R, 1 < p < oo; p will be with reference to a 

iv. U C R"^, U is measurable. 



norm — either the Lp norm or the Sobolev norm; 



V. a = (ai, a2^ ■ ■ ■ , ctd)^ ^ d-tuple of non-negative 
integers (or a multi- index) satisfying |a| = ai + 
Q!2 + • • • + a/c, |a| < m; 

vi. for X e R'\ x°- = 1 • a;^= . . . x'^" . 

vii. D°' denotes the partial derivative of order \a\ 



(8) 



viii. u G L\^^ {U) is a locally integrable, real valued func- 
tion on U 

ix. p™^ is a metric, dependent on the subset [/, the 
measure /x, and p and m in a manner we will define 
shortly; 

X. II • lip is the standard norm in Lp{U); 
Letting m be a positive integer and 1 < p < oo, we 



define the Sobolev norm, 



as follows: 




i/p 



(9) 



where u G Lj^ci^) ^ locally integrable, real valued 
function on [/ C i?** (u could be significantly more gen- 
eral) and II • lip is the standard norm in Lp{U). Likewise, 
the Sobolev metric can be defined: 



P^pl/^S) = \\f ~ g\\m.,p.,U.,i 



(10) 



It is important to note that this metric is dependent on 
U. 

For ease of notation, let us define the set of m-times 
differentiable functions on U, 

C^{U) = {/ e C{U)\D^} e C(C/), WD^fWp < ooV«, |a| < m} 

(11) 

We are now free to define the Sobolev space for which 
our results will apply. 

Definition 1 For any positive integer m and 1 < p < oo, 

we define a Sobolev space S^{U, A) as the vector space on 
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which II • ||m,p is a norm: 

SZ'iU, A) = {/ e C"({7)| \\D^f\\p,u.x < oo for all \a\ < m 

(12) 

Equipped with the Sobolev norm, S"™ is a Sobolev space 
over U d R^. 

Two functions in S™{U,\) are close in the Sobolev 
metric if all the derivatives of order < |a| < m are 
close in the Lp metric. It is useful to recall that we 
are attempting to approximate F — g o F o g^^ where 
F : i?; for this task the functions from S'^{U, A) 

will serve us quite nicely. The whole point of all this ma- 
chinery is to state approximation theorems that require 
specific notions of density. Otherwise we would refrain 
and instead use the standard notion of functions — 
the functions that are fc-times differentiable uninhibited 
by a notion of a metric or norm. 

Armed with a specific function space for which the 
approximation results apply (there are many more), we 
will conclude this section by briefly stating one of the ap- 
proximation results. However, before stating the approx- 
imation theorem, we need two definitions — one which 
makes the notion of closeness of derivatives more pre- 
cise and one which gives the sufficient conditions for the 
activation functions to perform the approximations. 

Definition 2 (m-uniformly dense) Assume m and 
I are non-negative integers < m < I, U C R'^ , and 
S C C\U). If, for any f ^ S, compact K C U, and 
e > there exists a g £ S(G) such that: 

max sup ID" fix) - D°'g{x)\ < e (13) 

then T,{G) is m-uniformly dense on compacta in S . 

It is this notion of m-uniformly dense in S that provides 
all the approximation power of both the mappings and 
the derivatives (up to order /) of the mappings. Next 
we will supply the condition on our activation function 
necessary for the approximation results. 

Definition 3 (Z-finite) Let I he an non-negative integer. 
G is said to he l-finite for G G G\R) if: 

0< J \D^G\dX < oo (14) 

i.e. the l*^ derivative of G must he hoth hounded away 
from zero, and finite for all I ( recall dX is the standard 
Lehesgue volume element). 

The hyperbolic tangent, our activation function, is l- 
finitc. 

With these two notions, we can state one of the many 
existing approximation results. 

Corollary 1 (corollary 3.5 ^ ) If G is l-fimte, < 
m < I, and U is an open suhset of R'^, then S(G') is m- 
uniformly dense on compacta in S^{U, A) for 1 < p < oo. 



In general, we wish to investigate differentiable mappings 
of compact sets to themselves. Further, we wish for the 
} derivatives to be finite almost everywhere. Thus the 
space S™{U, A) will suffice for our purposes. Our results 
also apply to piecewisc differentiable mappings. How- 
ever, this requires a more general Sobolev space, VF™. 
We have refrained from delving into the definition of this 
space since it requires a bit more formalism, for those 
interested see and [20| . 

4. Our neural network construction 

The single layer feed-forward neural networks (7's from 
the above section) we will consider are of the form 

AT / d \ 

xt = Po + PtG sujiQ + s ^ ujijXt-j (15) 
i=i \ j=i / 

which is a map from i?"* to R. The squashing function G, 
for our purpose, will be the hyperbolic tangent. In l|15|l . 
N represents the number of hidden units or neurons, d is 
the input or embedding dimension of the system which 
functions simply as the number of time lags, and s is a 
scaling factor on the weights. 

The parameters are real {f3i , Wij ,Xj,s G R) and the /3i 's 
and Wij 's are elements of weight matrices (which we hold 
fixed for each case). The initial conditions are denoted 
as (xo, xi, . . . , Xd), and (xt, Xt+i, . . . , Xt+d) represent the 
current state of the system at time t. 

We assume that the /3's are iid uniform over [0, 1] and 

then re-scaled to satisfy the condition J2iLi Pf — ^ ■ The 
Wij 's are iid normal with zero mean and unit variance. 
The s parameter is a real number and can be interpreted 
as the standard deviation of the w matrix of weights. 
The initial chosen iid uniform on the interval 

[—1, 1]. All the weights and initial conditions are selected 
randomly using a pseudo-random number generator 2^1 , 

We would like to make a few notes with respect to our 
squashing function, tanh(). First, tanh(a;), for \x\ ^ 1 
will tend to behave much like a binary function. Thus, 
the states of the neural network will tend toward the 
finite set (/3o ± /3i ± /32 • ■ • ± Pn), or a set of 2^ differ- 
ent states. In the limit where the arguments of tanh() 
become infinite, the neural network will have periodic 
dynamics. Thus, if < /3 > or s become very large, the 
system will have a greatly reduced dynamic variability. 
Based on this problem, one might feel tempted to bound 
the /3's a la X^i^i lAI — ^ fixing k for all N and d. This 
is a bad idea however since, if the PiS are restricted to 
a sphere of radius k, as N \s increased, {(ii^) goes to 
zero |23| • The other extreme of our squashing also yields 
a very specific behavior type. For x very near 0, the 
tanh(a;) function is nearly linear. Thus choosing s small 
will force the dynamics to be mostly linear, again yield- 
ing fixed point and periodic behavior (no chaos). Thus 
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the scaling parameter s will provide a unique bifurcation 
parameter that will sweep from linear ranges to highly 
non-linear ranges, to binary ranges - fixed points to chaos 
and back to periodic phenomena. 

Note that in a very practical sense, the measure we 
are imposing on the set of neural networks is our means 
of selecting the weights that define the networks. This 
will introduce a bias into our results that is unavoidable 
in such experiments; the very act of picking networks 
out of the space will determine, to some extent, our re- 
sults. Unlike actual physical experiments, we could, in 
principle, prove an invariance of our results to our in- 
duced measure. This is difficult and beyond the scope 
of this paper. Instead it will suffice for our purposes to 
note specifically what our measure is (our weight selec- 
tion method), and how it might bias our results. Our 
selection method will include all possible networks, but 
clearly not with the same likelihood. In the absence of 
a theorem with respect to an invariance of our induced 
measure, we must be careful in stating what our results 
imply about the ambient function space. 

B. Characteristic Lyapunov exponents and 
Hyperbolicity 

Let us now define the diagnostics for our numerical 
simulations. We will begin by defining structural sta- 
bility and its relevant notion of topological equivalence 
(between orbits, attractors, etc), topological conjugacy. 
We will then discuss notions that are more amenable to 
a numerical study, yet can be related to the geometri- 
cal notions of structural stability. Hyperbolicity will be 
defined in three successive definitions, each with increas- 
ing generality, culminating with a definition of partial 
hyperbolicity. This will be followed with a global gen- 
eralization of local eigenvalues, the Lyapunov spectrum. 
We will include here a brief statement regarding the con- 
nection between structural stability, hyperbolicity, and 
the Lyapunov spectrum. 

Definition 4 (Structural Stability) A C" discrete- 
time map, j , is structurally stable if there is a C' neigh- 
borhood, V of f, such that any g ^ V is topologically 
conjugate to f, i.e. for every g G V, there exists a home- 
omorphism h such that f = h~^ o g o h. 

In other words, a map is structurally stable if, for all 
other maps g in a C neighborhood, there exists a home- 
omorphism that will map the domain of / to the domain 
of g, the range of / to the range of g, and the inverses 
respectively. This is a purely topological notion. 

Next, let us begin by defining hyperbolicity in an intu- 
itive manner, followed by a more general definition useful 
for our purposes. Let us start with a linear case: 

Definition 5 (Hyperbolic linear map) A linear map 
of R" is called hyperbolic if all of its eigenvalues have 
modulus different from one. 



The above definition can be generalized as follows: 

Definition 6 ( Hyperbolic map) A discrete-time map 
f is said to be hyperbolic on a compact invariant set A if 
there exists a continuous splitting of the tangent bundle, 
TM\a ^ E" ® E^, and there are constants C > 0, < 
A < 1, such that IID/^IbjII < CA" and ||i:'/""|_E-|| < 
CA" for any n > Q and a; G A. 

Here the stable bundle E^ (respectively unstable bundle 
-E") of X e A is the set of points p G M such that \ f''{x) — 
f^{p)\ as fc — > oo [k ^ —oo respectively). 

As previously mentioned, strict hyperbolicity is a bit 
restrictive; thus let us make precise the notion of a "little 
bit" of hyperbolicity: 

Definition 7 (Partial hyperbolicity) The diffeomor- 
phism f of a smooth Riemannian manifold M is said to 
be partially hyperbolic if for all x & M the tangent bundle 
TxM has the invariant splitting: 

T^M = E'^ix) ® E^ix) © E'ix) (16) 

into strong stable E'^(x) — Ej{x), strong unstable 
E"{x) = EJ{x), and central E^\x) — E'j:{x) bundles, 
at least two of which are non-trivial\6Al . Thus there will 
exist numbers 0<a<b<l<c<d such that, for all 
X e M: 

veE^{x)^d\\v\\<\\D,J{v)\\ (17) 
V e E^x) ^ 6||«|| < \\DJiv)\\ < c\\v\\ (18) 
veE'{x)^\\D,fiv)\\<a\\v\\ (19) 

More specific characteristics and definitions can be found 
in references "255, H, [l|, 0], and jl^. The key pro- 
vided by definition [3 is the allowance of center bundles, 
zero Lyapunov exponents, and in general, neutral di- 
rections, which are not allowed in strict hyperbolicity. 
Thus we are allowed to keep the general framework of 
good topological structure, but we lose structural sta- 
bility. With non-trivial partial hyperbolicity (i.e. E'^ is 
not null), stable ergodicity replaces structural stability 
as the notion of dynamic stability in the Pugh-Shub sta- 
bility conjecture (conjecture © of 0). Thus what is 
left is to again attempt to show the extent to which sta- 
ble ergodicity persists, and topological variation is not 
pathological, under parameter variation with non-trivial 
center bundles present. Again, we note that results in 
this area will be discussed in a later section. 

In numerical simulations we will never observe an or- 
bit on the unstable, stable, or center manifolds. Thus 
we will need a global notion of stability averaged along a 
given orbit (which will exist under weak ergodic assump- 
tions). The notion we seek is captured by the spectrum 
of Lyapunov exponents. 

We will initially define Lyapunov exponents formally, 
followed by a more practical, computational definition. 

Definition 8 (Lyapunov Exponents) Let f : M 

M be a diffeomorphism (i.e. discrete time map) on a 
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compact Riemannian manifold of dimension m. Let \ ■ \ 
be the norm on the tangent vectors induced by the Rie- 
mannian metric on M . For every x & M and v G T^M 
Lyapunov exponent at x is denoted: 

X(x,w) =limsup jlog\\Df"v\\ (20) 

t — >QO t 

Assume the function xi^i ') only finitely many values 
on T^M {0} (this assumption may not be true for our dy- 
namical systems) which we denote Xi {x) < xi (■^) ' ' ' < 
Xmi^)- Next denote the filtration of T^M associated with 
X{x,-), {0} = Vo{x) C Vi(x) C ... C Vrnix) = T^M, 
where Vi{x) — {v d TxM\x{x,v) < Xi{x)}. The num- 
ber ki = dim{Vi{x)) — dim(yi-i{x)) is the multiplicity 
of the exponent Xiix)- In general, for our networks over 
the parameter range we are considering, ki = I for all 
< i < m. Given the above, the Lyapunov spectrum for 
/ at a; is defined: 

Spx(x) = {x,'(x)|l<z<m} (21) 

(For more information regarding Lyapunov exponents 
and spectra see js^l, or [3ll |. 

A more computationally motivated formula for the 
Lyapunov exponents is given as: 

1 ^ 

= J™ M E HiiDfk ■ Sx,f, (Dfk ■ 6x,))) (22) 

k=l 

where (, ) is the standard inner product, Sxj is the j*'' 
component of the x variation|65| and Dfk is the "or- 
thogonalized" Jacobian of / at the A;*'' iterate of f{x). 
Through the course of our discussions we will dissect 
equation (|22|l further. It should also be noted that Lya- 
punov exponents have been shown to be independent of 
coordinate system, thus the specifics of our above defini- 
tion do not affect the outcome of the exponents. 

The existence of Lyapunov exponents is established by 
a multiplicative ergqdic theorem (for a nice example, see 
theorem (1.6) in 32]). There exist many such theorems 
for various circumstances. The first multiplicative er- 
godic theorem was proven by Oseledec js^l ; many others - 
[Hi, |36(, Isl, lai, and m - have subsequently 

generalized his original result. We will refrain from stat- 
ing a specific multiplicative ergodic theorem; the condi- 
tions necessary for the existence of Lyapunov exponents 
are exactly the conditions we place on our function space 
in section in (|II Cp . In other words, a C (r > 0) map 
of a compact manifold M to itself and an /—invariant 
probability measure p, on M . For specific treatments we 
leave the curious reader to study the aforementioned ref- 
erences, noting that our construction follows from |35l |. 
m, and 13. 

There is an intimate relationship between Lyapunov 
exponents and global stable and unstable manifolds. In 
fact, each Lyapunov exponent corresponds to a global 
manifold. We will be using the global manifold structure 



as our measure of topological equivalence, and the Lya- 
punov exponents to classify this global structure. Posi- 
tive Lyapunov exponents correspond to global unstable 
manifolds, and negative Lyapunov exponents correspond 
to global stable manifolds. We will again refrain from 
stating the existence theorems for these global manifolds, 
and instead note that in addition to the requirements for 
the existence of Lyapunov exponents, the existence of 
global stable/unstable manifolds corresponding the neg- 
ative/positive Lyapunov exponents requires Df to be in- 
jective. Fo r sp ecific global unstable/stable manifold the- 
orems see [33 • 

The theories of hyperbolicity, Lyapunov exponents and 
structural stability have had a long, wonderful, and tan- 
gled history (for good starting points see *3^ or We 
will, of course, not scratch the surface with our current 
discussion, but rather put forth the connections relevant 
for our work. Lyapunov exponents are the logarithmic 
average of the (properly normalized) eigenvalues of the 
local (linearization at a point) Jacobian along a given or- 
bit. Thus for periodic orbits, the Lyapunov exponents 
are simply the log of the eigenvalues. A periodic orbit 
with period p is hyperbolic if either the eigenvalues of 
the time p map are not one, or the Lyapunov exponents 
are not zero. The connection between structural stability 
and hyperbolocity is quite beautiful and has a long and 
wonderful history beginning with Palis and Smale [4ll |. 
For purposes of interpretation later, it will be useful to 
state the solution of the stability conjecture: 

Theorem 2 (Mane 01 theorem A, Robbin [5], 
Robinson [42]) A diffeomorphism (on a compact, 
boundaryless manifold) is structurally stable if and only 
if it satisfies axiom A and the strong transversality con- 
dition. 

Recall that axiom A says the diffeomorphism is hyper- 
bolic with dense periodic points on its non-wandering set 

(p G O is non-wandering if for any neighborhood U of 
X, there is an n > such that /"(C/) nU ^0). We will 
save a further explicit discussion of this interrelationship 
for a later section, noting that much of this report inves- 
tigates the above notions and how they apply to our set 
of maps. 

Finally, for a nice, sophisticated introduction to the 
above topics see [s^ . 



C. Conditions needed for the existence of 
Lyapunov exponents 

Lyapunov exponents are one of our principal diagnos- 
tics, thus we must briefly justify their existence for our 
construction. We will begin with a standard construc- 
tion for the existence and computation of Lyapunov ex- 
ponents as defined by the theories of Katok b4| , Ruelle 
1|5|, |32], Pesin [H, [33, HI, Brin and Pesin 25], and 
Burns, Dolgopyat and Pesin !27j. We will then note how 
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this applies to our construction. (For more practical ap- 
proaches to the numerical calculation of Lyapunov spec- 
tra see [33, lil, and 

Let H he a separable real Hilbert space (for practical 
purposes i?"), and let X be an open subset of Ti.. Next let 
{X, S, p) be a probability space where E is a cr— algebra 
of sets, and p is a probability measure, p{X) ~ 1 (see 
for more information). Now consider a C (r > 1) map 
ft'.X^X which preserves p {p is /—invariant) defined 
for t > Tq > such that fti+t2 = /ti ° ft2 ^^'^ that 
(a;,t) 1-^ ft{x), Dft{x) is continuous from X x [To,oo) 
to X and bounded on Ti.. Assume that / has a compact 
invariant set 

A={f| /,(X)|/,(A)CA} (23) 

t>To 

and Dft is a compact bounded operator for a; G A, i > Tq. 
Finally, endow ft with a scalar parameter s £ [0 : oo]. 
This gives us the space (a metric space - the metric will 
be defined heuristically in section III A 4|l of one parame- 
ter, measure-preserving maps from bounded compact 
sets to themselves with bounded first derivatives. It is 
for a space of the above mappings that Ruelle shows the 
existence of Lyapunov exponents l35| (similar, require- 
ments are made by Brin and Pesin |25j in a slightly more 
general setting). 

Now we must quickly justify our use of Lyapunov expo- 
nents. Clearly, we can take X in the above construction 
to be the of section Ijll A As our neural networks 
map their domains to compact sets, and they are con- 
structed as time-delays, their domains are also compact. 
Further, their derivatives are bounded up to arbitrary or- 
der, although for our purposes, only the first order need 
be bounded. Because the neural networks are determinis- 
tic and bounded, there will exist an invariant set of some 
type. All we need yet deal with is the measure preserva- 
tion of which previously there is no mention. This issue 
is partially addressed in |43 | in our neural network con- 
text. There remains much work to achieve a full under- 
standing of Lyapunov exponents for general dissipative 
dynamical systems that are not absolutely continuous, 
for a current treatment see The specific measure 

theoretic properties of our networks (i.e. issues such as 
absolute continuity, uniform/non- uniform hyperbolicity, 
basin structures, etc) is a topic of current investigation. 



D. Definitions for numerical arguments 

Since we are conducting a numerical experiment, we 
will present some notions needed to test our conjectures 
numerically. We will begin with a notion of continuity. 
The heart of continuity is based on the following idea: if 
a neighborhood about a point in the domain is shrunk, 
this implies a shrinking of a neighborhood of the range. 
However, we do not have infinitesimals at our disposal. 
Thus, our statements of numerical continuity will neces- 



sarily have a statement regarding the limits of numerical 
resolution below which our results are uncertain. 

Let us now begin with a definition of bounds on the 
domain and range: 

Definition 9 ( e„„„i) e„„„i *s the numerical accuracy of 
a Lyapunov exponent, Xj- 

Definition 10 (Snum) ^num is the numerical accuracy 
of a given parameter under variation. 

Now, with our enum and 5num defined as our numerical 
limits in precision, let us define numerical continuity of 
Lyapunov exponents. 

Definition 11 (num— continuous Lyapunov expo- 
nents) Given a one parameter map f : x R^ R'^ , 
f C^' , r > 0, for which characteristic exponents Xj es- 
ist (and are the same under all invariant measures) . The 
map f is said to have num- continuous Lyapunov expo- 
nents at {p,x) e i?^ X R'^ if for enum > there exists a 
Syium > such that if: 

|S - S'l < 5num (24) 

then 

(25) 

for s, s' € R^ , for all j ^ N such that < j < d. 

Another useful definition related to continuity is that of 
a function being Lipschitz continuous. 

Definition 12 (num— Lipschitz) Given a one parame- 
ter map f : R^ X R'^ ^ R'^, f e C, r > 0, for which 
characteristic exponents Xj exist (and are the same un- 
der all invariant measures), the map f is said to have 
num-Lipschitz Lyapunov exponents at {fj,, x) € R^ x R'^ 
if there exists a real constant < k^. such that 

\x,is)^X,is')\<k^,\s-s'\ (26) 

Further, if the constant ky. < 1, the Lyapunov exponent 
is said to be contracting ^^6 (ij on the interval [s,s'] for all 

s' such that \s — s'\ < Snum- 

Note that neither of these definitions imply strict conti- 
nuity, but rather, they provide bounds on the difference 
between the change in parameter and the change in Lya- 
punov exponents. It is important to note that these no- 
tions are highly localized with respect to the domain in 
consideration. We will not imply some sort of global 
continuity using the above definitions, rather, we will 
use these notions to imply that Lyapunov exponents will 
continuously (within numerical resolution) cross through 
zero upon parameter variation. We can never numerically 
prove that Lyapunov exponents don't jump across zero, 
but for most computational exercises, a jump across zero 
that is below numerical precision is not relevant. This 
notion of continuity will aid in arguments regarding the 
existence of periodic windows in parameter space. 

Let us next define a Lyapunov exponent zero-crossing: 
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Definition 13 (Lyapunov exponent zero-crossing) 

A Lyapunov exponent zero-crossing is simply the point 
s^- in parameter space such that a Lyapunov exponent 
continuously (or num— continuously) crosses zero. e.g. 
for s — (5, Xi > 0, and for s + S, Xi < 0- 

For this report, a Lyapunov exponent zero-crossing is 
a transverse intersection with the real hne. For our net- 
works non-transversal intersections of the Lyapunov ex- 
ponents with the real line certainly occur, but for the 
portion of parameter space we are investigating, they are 
extremely rare. Along the route-to-chaos for our net- 
works, such non-transversal intersections are common, 
but will save the discussion of that topic for a different 
report. Orbits for which the Lyapunov spectrum can be 
defined (in a numerical sense, Lyapunov exponents are 
defined when they are convergent), yet at least one of the 
exponents is zero are called non-trivially num— partially 
hyperbolic. We must be careful making statements with 
respect to the existence zero Lyapunov exponents imply- 
ing the existence of corresponding center manifolds E'^ as 
we can do with the positive and negative exponents and 
their respective stable and unstable manifolds. 

Lastly, we define a notion of denseness for a numeri- 
cal context. There are several ways of achieving such a 
notion — wc will use the notion of a dense sequence. 

Definition 14 (e-dense) Given an e > 0, an open 
interval {a,b) C R, and a sequence {ci,...,c„}, 
{ci, . . . , c„} is e-dense in (a, b) if there exists an n such 
that for any x £ (a, 6), there is an i, 1 < i < n, such that 
dist(x, Ci) < e. 

In reality however, we will be interested in a sequence 
of sequences that are "increasingly" e-dense in an interval 
(a, b). In other words, for the sequence of sequences 



where rii+i > Ui (i.e. for a sequence of sequences with 
increasing cardinality), the subsequent sequences for in- 
creasing Hi become a closer approximation of an e-dense 
sequence. Formally — 

Definition 15 (Asymptotically Dense (a dense)) 

A sequence Sj — {c[, . . . ,cl^.} C (a, b) of finite subsets is 
asymptotically dense in {a,b), if for any e > 0, there is 
an N such that Sj is e-dense if j > N. 

For a intuitive example of this, consider a sequence S oi k 
numbers where qk £ S, qk £ (0, 1). Now increase the car- 
dinality of the set, spreading elements in such a way that 
they are uniformly distributed over the interval. Density 
is achieved with the cardinality of infinity, but clearly, 
with a finite but arbitrarily high number of elements, we 
can achieve any approximation to a dense set that we 
wish. There are, of course, many ways we can have a 
countably infinite set that is not dense, and, as we are 



working with numerics, we must concern ourselves with 
how we will approach this asymptotic density. We now 
need a clear understanding of when this definition will ap- 
ply to a given set. There are many pitfalls; for instance, 
we wish to avoid sequences such as (1, ^, |, • • • , ^, • • • )■ 
We will, in the section that addresses a— density, state 
the necessary conditions for an a— dense set for our pur- 
poses. 



III. CONJECTURES 

The point of this exercise is verifying three properties 
of C" maps along a one-dimensional interval in parame- 
ter space. The first property is the existence of a collec- 
tion of points along an interval in parameter space such 
that hyperbolicity of the mapping is violated. The sec- 
ond property, which is really dependent upon the first 
and third properties, is the existence of an interval in 
parameter space of positive measure such that topologi- 
cal change (in the sense of changing numbers of unstable 
manifolds) with respect to slight parameter variation on 
the aforementioned interval is common. The final prop- 
erty we wish to show, which will be crucial for arguing the 
second property, is that on the aforementioned interval 
in parameter space, the topological change will not yield 
periodic windows in the interval if the dimension of the 
mapping is sufficiently high. More specifically, we will 
show that the ratio of periodic window size to parameter 
variation size {5s) goes to zero on our chosen interval. 

Condition 1 Given a map (neural network) as defined 
in section jllA 4 ), if the parameter s ^ R^ is varied 
num— continuously, then the Lyapunov exponents vary 
num— continuously. 



There are many counterexamples to this condition, so 
many of our results will rest upon our ability to show how 
generally the above condition applies in high-dimensional 
systems. 

Definition 16 (Chain link set) Assume f is a map- 
ping (neural network) as defined in section \I1A4\ ). A 
chain link set is denoted: 

V ^{seR \ Xj{s) ^ for allO<j <d 

and Xjis) > for some j > 0} 

If Xj(s) is continuous at its Lyapunov exponent zero- 
crossing, as we will show later (a la condition 1^), then 
V is open. Next, let Ck be a connected component of 
the closure of V, V. It can be shown that Cfe fl is 
a union of disjoint, adjacent open intervals of the form 
[j.{ai,ai+i). 

Definition 17 (Bifurcation link set) Assume f is a 
mapping (neural network) as defined in section jLLA 4]) - 
Denote a bifurcation link set of Ck n V as: 



(27) 
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Assume the number of positive Lyapunov exponents for 
each Vi C V remains constant, if, upon a monotonically 
increasing variation in the parameter s, the number of 
positive Lyapunov for Vi is greater than the number of 
positive Lyapunov exponents for l^+i, V is said to be 
LCE decreasing. SpecificaUy, the endpoints of Vi's are 
the points where there exist Lyapunov exponent zero 
crossings. We are not particularly interested in these 
sets however, rather we are interested in the collection of 
endpoints adjoining these sets. 

Definition 18 (Bifurcation chain subset) Let V be 

a chain link set, and Ck a connected component of V . A 
bifurcation chain subset of CkHV is denoted: 

Uk = {a,} (28) 

or equivalently: 

Uk = d{Ck n V) (29) 

For our purposes in this work, we will consider a bi- 
furcation chain subset U such that ai corresponds to the 
last zero crossing of the least positive exponent and 6„ 
will depend upon the specific case and dimension. In a 
practical sense, ai ^ 0.5 and 6„ ~ 6. For higher di- 
mensional networks, hn ^ Q will correspond to a much 
higher n than for a low-dimensional network. For an in- 
tuitive picture of what we wish to depict with the above 
definitions, consider figure l(2Jl. 

Wc will now state the conjectures, followed by some 
definitions and an outline of what we will test and why 
those tests will verify our claims. 

Conjecture 1 (Hyperbolicity violation) Assume f 
is a mapping (neural network) as defined in section 
\IIA 4\ ) with a sufficiently high number of dimensions, 
d. There exists at least one bifurcation chain subset U . 

The intuition arises from a straightforward considera- 
tion of the neural network construction in section Ijll A 4p . 
From consideration of our specific neural networks and 
their activation function, tanh(), it is clear that varia- 
tion of the scaling parameter, s, on the variance of the 
interaction weights uj forces the neural networks from a 
linear region, through a non-linear region, and into a bi- 
nary region. This implies that, given a neural network 
that is chaotic for some value of s, upon the monotoni- 
cally increasing variation of s from zero, the dynamical 
behavior will begin at a fixed point, proceed through a 
sequence of bifurcations, become chaotic, and eventually 
become periodic. If the number of positive Lyapunov ex- 
ponents can be shown to increase with the dimension of 
the network and if the Lyapunov exponents can be shown 
to vary relatively continuously with respect to parame- 
ter variation with increasing dimension, then there will 
be many points along the parameterized curve such that 
there will exist neutral directions. The ideas listed above 
provide the framework for computational verification of 
conjecture (j2Jl. We must investigate conjecture with 



respect to the subset U becoming a — dense in its closure 
and the existence of very few (ideally a single) connected 
components of V. 

Conjecture 2 (Existence of a Codimension e bifur- 
cation set) Assume f is a mapping (neural network) as 
defined in section jll A ^| ) with a sufficiently high num- 
ber of dimensions, d, and a bifurcation chain set U as per 
conjecture 0). The two following (equivalent) statements 
hold: 

i. In the infinite- dimensional limit, the cardinality of 
U will go to infinity, and the length max |ai+i — ai\ 
for all i will tend to zero on a one dimensional in- 
terval in parameter space. In other words, the bi- 
furcation chain set U will be a— dense in its closure, 
U. 

ii. In the asymptotic limit of high dimension, for all 
s ^ U , and for all f at s, an arbitrarily small per- 
turbation 6s of s will produce a topological change. 
The topological change will correspond to a differ- 
ent number of global stable and unstable manifolds 
for f at s compared to f at s -\- S. 

Assume M is a C" manifold of topological dimension 
d and is a submanifold of M. The codimension of A^ 
in M is defined codim{N) = dim{M) — dim{N). If there 
exists a curve p through M such that p is transverse to 
A^ and the codim{N) < 1, then there will not exist an ar- 
bitrarily small perturbation to p such that p will become 
non-transverse to N. Moreover, if codim{N) — and 
pf]N C int{N), then there does not even exist an arbi- 
trarily small perturbation of p such that p intersects A^ at 
a single point of A^, i.e. the intersection cannot be made 
non-transverse with an arbitrarily small perturbation. 

The former paragraph can be more easily understood 
via figure © where we have drawn four different circum- 
stances. This first circumstance, the curve pi n A^, is an 
example of a non-transversal intersection with a codimen- 
sion submanifold. This intersection can be perturbed 
away with an arbitrarily small perturbation of pi . The 
intersection, p2 n A^, is a transversal intersection with a 
codimension submanifold, and this intersection cannot 
be perturbed away with an arbitrarily small perturba- 
tion of p2- Likewise, the intersection, pidO, which is an 
example of a transversal intersection with a codimension 
1 submanifold cannot be made non-transverse or null via 
an arbitrarily small perturbation of pi . The intersection 
P2 n O is a non-transversal intersection with a codimen- 
sion 1 submanifold and can be perturbed away with an 
arbitrarily small perturbation of p2. This outlines the 
avoid-ability of codimension and 1 submanifolds with 
respect to curves through the ambient manifold M. The 
point is that non-null, transversal intersections of curves 
with codimension or 1 submanifolds cannot be made 
non-transversal with arbitrarily small perturbations of 
the curve. Transversal intersections of curves with codi- 
mension 2 submanifolds, however, can always be removed 
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dense LE zero crossings 

FIG. 3: The top drawing represents various standard pictures 
from transversality theory. The bottom drawing represents 
an ideahzed version (in higher dimensions) of transversahty 
catering to our arguments. 



by an arbitrarily small perturbation due to the existence 
of a "free" dimension. A practical example of such would 
be the intersection of a curve with another curve in 
— one can always pull apart the two curves simply by 
"lifting" them apart. 

In the circumstance proposed in conjecture the 
set U {N in the Fig. ||3Jl) will always have codimension d 
because U consists of finitely many points, thus any in- 



tersection with U can be removed by an arbitrarily small 
perturbation. The point is that, as U becomes a-dense 
in U , psf^U = becomes more and more unlikely and 
the perturbations required to remove the intersections of 
P3 with U (again, N as in the Fig. Q ) will become 
more and more bizarre. For a low-dimensional exam- 
ple, think of a ball of radius r in that is populated 
by a finite set of evenly distributed points, denoted Si, 
where i is the number of elements in Si. Next fit a curve 
p through that ball in such a way that p does not hit 
any points in Si. Now, as the cardinality of Si becomes 
large, if Si is a-dense in the ball of radius r, for the in- 
tersection of p with Si to remain null, the p will need to 
become increasingly kinky. Moreover, continuous, linear 
transformations of p will become increasingly unlikely to 
preserve p H Si — 0. It is this type of behavior with 
respect to parameter variation that we are arguing for 
with conjecture However, figure Q is should only 
be used as an tool for intuition — our conjectures are 
with respect to a particular interval in parameter space 
and not a general curve in parameter space, let alone a 
family of curves or a high-dimensional surface. Conjec- 
ture 121) is a first step towards a more complete argument 
with respect to the above scenario. For more information 
for where the above picture originates, see [l^ or [50| . 

To understand roughly why we believe conjecture Q 
is reasonable, first take condition for granted (we will 
expend some effort showing where condition Q is reason- 
able) . Next assume there are arbitrarily many Lyapunov 
exponents near along some interval of parameter space 
and that the Lyapunov exponent zero-crossings can be 
shown to be a— dense with increasing dimension. Fur- 
ther, assume that on the aforementioned interval, V is 
LCE decreasing. Since varying the parameters continu- 
ously on some small interval will move Lyapunov expo- 
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nents continuously, small changes in the parameters will 
guarantee a continual change in the number of positive 
Lyapunov exponents. One might think of this intuitively 
relative to the parameter space as the set of Lyapunov ex- 
ponent zero-crossings forming a codimension submani- 
fold with respect to the particular interval of parameter 
space. However, we will never achieve such a situation in 
a rigorous way. Rather, we will have an a— dense bifur- 
cation chain set U, which will have codimension 1 in R 
with respect to topological dimension. As the dimension 
of / is increased, U will behave more like a codimension 
submanifold of R. Hence the metaphoric language, 
codimension e bifurcation set. The set U will always be 
a codimension one submanifold as it is a finite set of 
points. Nevertheless, if U tends toward being dense in 
its closure, it will behave increasingly like a codimension 
zero submanifold. This argument will not work for the 
entirety of the parameter space, and thus wc will show 
where, to what extent, and under what conditions U ex- 
ists and how it behaves as the dimension of the network 
is increased. 

Conjecture 3 (Periodic v^rindow probability de- 
creasing) Assume f is a mapping (neural network) as 
defined in section jllA 4}^ and a bifurcation chain set U as 
per conjecture 0). In the asymptotic limit of high dimen- 
sion, the length of the bifurcation chain sets, I = \an — ai\, 
increases such that the cardinality of U —>■ m where m is 
the maximum number of positive Lyapunov exponents for 
f. In other words, there will exist an interval in param- 
eter space (e.g. s G (ai,a„) ^ (0.1, 4) j where the proba- 
bility of the existence of a periodic window will go to zero 
(with respect to Lebesgue measure on the interval) as the 
dimension becomes large. 

This conjecture is somewhat difficult to test for a spe- 
cific function since adding inputs completely changes the 
function. Thus the curve through the function space is 
an abstraction we are not afforded by our construction. 
We will save a more complete analysis (e.g. a search 
for periodic windows along a high-dimensional surface in 
parameter space) of conjecture for a different report. 
In this work, conjecture ^ addresses a very practical 
matter, for it implies the existence of a much smaller 
number of bifurcation chain sets. The previous conjec- 
tures allow for the existence of many of these bifurcation 
chains sets, U, separated by windows of periodicity in 
parameter space. However, if these windows of periodic 
dynamics in parameter space vanish, we could end up 
with only one bifurcation chain set — the ideal situation 
for our arguments. We will not claim such, however we 
will claim that the length of the set U we are concern- 
ing ourselves with in a practical sense will increase with 
increasing dimension, largely due to the disappearance 
of periodic windows on the closure of V. With respect 
to this report, all that needs be shown is that the win- 
dow sizes along the path in parameter space for a variety 
of neural networks decreases with increasing dimension. 



From a qualitative analysis it will be somewhat clear that 
the above conjecture is reasonable. 

If this were actually making statements we could rigor- 
ously prove, conjectures 0i © would function 
as lemmas for conjecture (0J). 

Conjecture 4 Assume f is a mapping (neural network) 
as defined in section jllA ^| ) with a sufficiently high num- 
ber of dimensions, d, a bifurcation chain set U as per con- 
jecture Q), and the chain link set V . The perturbation 
size ds of s G Cmax, where Cmax is the largest connected 
component ofV, for which f\ck remains structurally sta- 
ble goes to zero as d ^ oo. 

Specific cases and the lack of density of structural 
stability in certain sets of dynamical systems has been 
proven long ago. These examples were, however, very 
specialized and carefully constructed circumstances and 
do not speak to the commonality of structural stabil- 
ity failure. Along the road to investigating conjecture 
Q we will show that structural stability will not, in a 
practical sense, be observable for a large set of very high- 
dimensional dynamical systems along certain, important 
intervals in parameter space even though structural sta- 
bility is a property that will exist on that interval with 
probability one (with respect to Lebesgue measure). To 
some, this conjecture might appear to contradict some 
well-known results in stability theory. A careful analysis 
of this conjecture, and its relation to known results will 
be discussed in sections (jVII A 4|l and (|VIIC 1|) . 

The larger question that remains, however, is whether 
conjecture is valid on high-dimensional surfaces in pa- 
rameter space. We believe this is a much more difficult 
question with a much more complicated answer. We can, 
however, speak to a highly related problem, the problem 
of whether chaos persists in high-dimensional dynamical 
systems. Thus, let us now make a very imprecise conjec- 
ture that we will make more concise in a later section. 



Conjecture 5 Chaos is a robust, high-probability behav- 
ior for high- dimensional, bounded, nonlinear dynamical 
systems. 

This is not a revelation (as previously mentioned, many 
experimentalists have been attempting to break this ro- 
bust, chaotic behavior for the last hundred years), nor 
is it a particularly precise statement. We have studied 
this question using neural networks much like those de- 
scribed in section (|II A 4|) . and we found that for high- 
dimensional networks with a sufficient degree of nonlin- 
earity, the probability of chaos was near unity !5l)| . Over 
the course of investigation of the above claims, we will see 
a qualitative verification of conjecture (|SJ). A more com- 
plete study will come from combining results from this 
study with a statistical perturbation study and combined 
with a study of windows proposed by 52;j and the closing 
lemma of Pugh 53] . 
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IV. NUMERICAL ERRORS IN LYAPUNOV 
EXPONENT CALCULATION 

Before we commence with our numerical arguments for 
the above conjectures, we analyze the numerical errors 
for both insight into how our chief diagnostic works and 
to establish bounds of accuracy on the numerical results 
that will follow. We will proceed first with an analy- 
sis of single networks of varying dimensions, providing 
intuition into the evolution of the calculation of the Lya- 
punov spectrum versus iteration time. Wc will follow this 
analysis with a statistical study of 1000 networks, mea- 
suring the deviation from the mean of the exponent over 
10000 time steps, thus noting how the individual expo- 
nents converge and to what extent the exponents of all 
the networks converge. 

We will begin by considering Fig. plots of the 

Lyapunov spectrum versus the first 10000 iterations for 
two networks with 16 and 64 dimensions. After approxi- 
mately 3000 time steps, all the large transients have es- 
sentially vanished, and aside from slight variation (espe- 
cially on a time scale long compared with a single time- 
step) the exponents appear to have converged. For the 
case with 16 dimensions the exponents also appear to 
have converged. The resolution for the network with 64 
dimensions is not fine enough to verify a distinction be- 
tween exponents, thus consideration of Fig. (jSJ demon- 
strates clearly that the exponents converge well within 
the inherent errors in the calculation, and are entirely 
distinct for time steps greater than 5500 time steps. It is 
worth noting that there are times when very long term 
transients occur in our networks. These transients would 
not be detectable from the figures we have presented, 
but these problem cases usually only exist near bifur- 
cation points. For the cases we are considering, these 
convergence issues do not seem to affect our results [g^l- 

Figures (@J) and Q provide insight into how the in- 
dividual exponents for individual networks converge; we 
now must establish the convergence of the Lyapunov ex- 
ponents for a large set of neural networks and present a 
general idea of the numerical variance (em) in the Lya- 
punov exponents. We will achieve this in the following 
manner: we will calculate the Lyapunov spectrum for an 
individual network for 5000 time steps; we will calculate 
the mean of each exponent in the spectrum; we will, for 
each time step calculate the deviation of the exponent 
from the mean of that exponent; we will follow the above 
procedure for 1000 networks and take the mean of the de- 
viation from the mean exponent at each time step. Fig- 
ure ijnji represents the analysis in the former statement. 
This figure demonstrates clearly that the deviation from 
the mean exponent, even for the most negative exponent 
(the most negative exponent has the largest error) drops 
below 0.01 after 3000 time steps. The fluctuations in the 
largest Lyapunov exponent lie in the 10"'^ range for 3000 
time-steps. Figure © also substantiates three notions: 
a measurement of how little the average exponent strays 
from its mean value; a measurement of the similarity of 



this characteristic over the ensemble of networks; and fi- 
nally it helps establish a general intuition with respect to 
the accuracy of our exponents, £,„ < 0.01 for 5000 time 
steps. 

It is worth noting that determining errors in the Lya- 
punov exponents is not an exact science; for our net- 
works such errors vary a great deal in different regions in 
s space. For instance, near the first bifurcation from a 
fixed point can require up to 100000 or more iterations 
to converge to an attractor and 50000 more iterations for 
the Lyapunov spectrum to converge. 

V. NUMERICAL ARGUMENTS FOR 
PRELIMINARIES 

Before we present our arguments supporting our con- 
jectures we must present various preliminary results. 
Specifically we will discuss the num— continuity of the 
Lyapuonv exponents, the a— density of Lyapunov expo- 
nent zero-crossings, and argue for the existence of arbi- 
trarily high number of positive exponents given an arbi- 
trarily high number of dimensions. With these prelimi- 
naries in place, the arguments supporting our conjectures 
will be far more clear. 



A. num— continuity 

Testing for the num— continuity of Lyapunov expo- 
nents formally will be two-fold. First, we will need to 
investigate, for a specific network, /, the behavior of Lya- 
punov exponents versus variation of parameters. Second, 
indirect, yet strong evidence of the num— continuity will 
also come from investigating how periodic window size 
varies with dimension and parameter variation. It is im- 
portant to note that when we refer to continuity, we are 
referring to a very local notion of continuity. Continuity 
is always in reference to the set upon which something 
(a function, a mapping, etc) is continuous. In the below 
analysis, the neighborhoods upon which continuity of the 
Lyapunov exponents are examined is over ranges of plus 
and minus one parameter increment. This is all that is 
necessary for our purposes, but this analysis cannot guar- 
antee strict continuity along, say, s e [0.1, 10], but rather 
continuity along little linked bits of the interval [0.1, 10]. 

1. Qualitative analysis 

Qualitatively, our intuition for num— continuity comes 
from examining hundreds of Lyapunov spectrum plots 
versus parameter variation. In this vein. Figs. and 
© present the difference between low and higher dimen- 
sional Lyapunov spectra. 

In Fig. the Lyapunov exponents look continu- 

ous within numerical errors (usually ±0.005). Figure © 
by itself provides little more than an intuitive picture of 
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FIG. 4: LE spectrum versus iteration for individual networks with 32 neurons and 16 (left, only the largest 8 are shown) and 
64 (right) dimensions 
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FIG. 5: Close-up of LE spectrum versus iteration: 32 neurons, 
64 dimensions 



what we are attempting to argue. As we will be mak- 
ing arguments that the Lyapunov spectrum will become 
more smooth, periodic windows will disappear, etc, with 
increasing dimension. Fig. Q shows a typical graph of 
the Lyapunov spectrum versus parameter variation for a 
neural network with 32 neurons and 4 dimensions. The 
contrast between Figs. © and iQ intuitively demon- 
strates the increase in continuity we are claiming. 

Although a consideration of Figs. iQ and l|HJl yields 
an observation that, as the dimension is increased, the 
Lyapunov exponents appear to be more continuous func- 
tion of the s parameter, the above figures alone do not 
verify num— continuity. In fact, it should be noted that 
pathological discontinuities have been observed in net- 
works with as many as 32 dimensions. The existence of 
pathologies for higher dimensions is not a problem we are 
prepared to answer in depth; it can be confidently said 
that as the dimension (number of inputs) is increased, 
the frequency of pathologies appears to become vanish- 
ingly rare (this is noted over our observation of several 
thousand networks with dimensions ranging from 4 to 
256). 



2. Quantitative and numerical analysis 

Our quantitative analysis will follow two lines. The 
first will be a specific analysis along the region of param- 
eter change for three networks with dimensions 4 and 64, 
respectively. This will be followed with a more statistical 
study of a number of networks per dimension where the 
dimensions will range from 4 to 128 in powers of 2. 

Consider the num— continuity of two different net- 
works while varying the s parameter. Figure lO is a 
plot of the mean difference in each exponent between 
parameter values summed over all the exponents. The 
parameter increment is 6s = 0.01. 

The region of particular interest is between s = and 
6. Considering this range, it is clear that the variation in 
the mean of the exponents versus variation in s decreases 
with dimension. The 4-dimensional network not only has 
a higher baseline of num— continuity, but it also has many 
large spikes. As the dimension is increased, considering 
the 64-dimensional case, the baseline of num— continuity 
is decreased, and the large spikes disappear. The spikes 
in the 4-dimensional case can be directly linked to the ex- 
istence of periodic windows and bifurcations that result 
in dramatic topological change. This is one verification 
of num— continuity of Lyapunov exponents. These two 
cases are quite typical, but it is clear that the above anal- 
ysis, although quite persuasive, is not adequate for our 
needs. We will thus resort to a statistical study of the 
above plots. 

The statistical support we have for our claim of in- 
creased num— continuity will focus on the parameter re- 
gion between s — and 6, the region in parameter space 
over which the maxima of entropy, Kaplan- Yorke dimen- 
sion, and the number of positive Lyapunov exponents 
exists. Figure H10(l considers the num— continuity along 
parameter values ranging from to 6. The points on the 
plot correspond to the mean (over a few hundred net- 
works) of the mean exponent change between parameter 
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FIG. 6: Mean deviation from the mean of the largest and most negative Lyapunov exponent per time-step for an ensemble of 
1000 networks with 32 neurons and 16 (left) and 64 (right) dimensions 
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FIG. 7: LE spectrum: 32 neurons, 4 dimensions. 



values, or: 

k=l 

where Z is the total number of networks of a given di- 
mension considered. 

Figure Hl()|l clearly shows that as the dimension is in- 
creased, for the same computation time, both the mean 
exponent change versus parameter variation per network 
and the standard deviation of the exponent change de- 
crease substantially as the dimension is increased. ^GSj Of 
course the mean change over all the exponents allows for 
the possibility for one exponent (possibly the largest ex- 
ponent) to undergo a relatively large change while the 
other exponents change very little. For this reason, we 
have included the nwm— continuity of the largest and the 



most negative exponents versus parameter change. The 
nitm— continuity of the largest exponents is very good, 
displaying a small standard deviation across many net- 
works. The error in the most negative exponent is inher- 
ent to our numerical techniques (specifically the Gram- 
Schmidt orthogonalization). The error in the most nega- 
tive exponent increases with dimension, but is a numeri- 
cal artifact. This figure yields strong evidence that in the 
region of parameter space where the network starts at a 
fixed point (all negative Lyapunov exponents), grows to 
having the maximum number of positive exponents, and 
returns to having a few positive exponents, the variation 
in any specific Lyapunov exponent is very small. 

There is a specific relation between the above data 
to definition 1121 num— Lipschitz is a stronger condition 
than num— continuity of Lyapunov exponents. The mean 
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FIG. 8: LE spectrum: 32 neurons, 64 dimensions. 
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FIG. 9: rmm— continuity (mean of |Xi(s) — Xi(* + ^or 
each t) versus parameter variation: 32 neurons, 4 (left) and 
64 (right) dimensions. 



nwrn— continuity at n = 32, d = 4 

num 

(31) 

|0.02| < fc|0.01| (32) 

yielding k = 2 which would not classify as 
nwrn— Lipschitz contracting, whereas for n = 32, d = 128 
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FIG. 10: Mean num— continuity, nitm— continuity of the 
largest and the most negative Lyapunov exponent of many 
networks versus their dimension. The error bars are the stan- 
dard deviation about the mean over the number of networks 
considered. 



we arrive at 

\Xj{s + Snum) - Xj{s)\ < kSnum (33) 

|0.004| < fc|0.01| (34) 

which yields k = 0.4 < 1 which docs satisfy the condi- 
tion for num— Lipschitz contraction. Even more striking 
is the num— continuity of only the largest Lyapunov ex- 
ponent; for n = 32, d = 4 we get 

\Xj{s + 6num) - Xj{s)\ < kSnuin (35) 

|0.015| < A;|0.01| (36) 
which yields k = 1.5, while the n = 32 d = 128 case is 

\Xj{s + Snum) - Xj{s)\ < kSnum (37) 

|0.002| < fc|0.01| (38) 
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FIG. 11: fc-scaling: logj of dimension versus logj of 
num— Lipschitz constant of the largest Lyapunov exponent. 



which nets k — 0.2. As the dimension is increased, k 
decreases, and thus 7iitm— continuity increases. As can 
be seen from Fig. H1Q|I . the rtwrn— continuity is achieved 
rather quickly as the dimension is increased; the Lya- 
punov exponents are quite continuous with respect to 
parameter variation by 16 dimensions. For an under- 
standing in an asymptotic hmit of high dimension, con- 
sider Fig. 1)11(1 . As the dimension is increased the log2 
of the dimension versus the log2(fc^i) yields the scaling 
k '-^ \/(|); thus as d — > CJO, k^-^ 0, which is exactly 
what we desire for continuity in the Lyapunov exponents 
versus parameter change. This completes our evidence 
for the nitm— continuity in high-dimensional networks. 

3. Relevance 

Conjectures (^3), @, and are all fundamentally 
based on condition For the neural networks, all we 
need to establish conjecture ^ is the num— continuity 
of the Lyapunov exponents, the existence of the fixed 
point for s near 0, the periodic orbits for s oo, and 
three exponents that are, over some region of parame- 
ter space, all simultaneously positive. The n-continuity 
of Lyapunov exponents implies, within numerical preci- 
sion, that Lyapunov exponents both pass through zero 
(and don't jump from positive to negative without pass- 
ing through zero) and are, within numerical precision, 
zero. 



B. a— density of zero crossings 

Many of our arguments will revolve around varying s 
in a range of 0.1 to 6 and studying the behavior of the 
Lyapunov spectrum. One of the most important features 
of the Lyapunov spectrum we will need is a uniformity 
in the distribution of positive exponents between and 
Xmax- As we are dealing with a countable set, we will 
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FIG. 12: Positive LE spectrum for typical individual networks 
with 32 neurons and 16 (top) and 64 (bottom) dimensions. 



refrain from any type of measure theoretic notions, and 
instead rely on a-density of the set of positive exponents 
as the dimension is increased. Recall the definition of 
a-dense (definition l|15|) ). the definition of a bifurcation 
chain subset (definition (|18|l ). which corresponds to the 
set of Lyapunov exponent zero crossings, and the defi- 
nition of a chain link set (definition H16|) '). Our conjec- 
tures will make sense if and only if, as the dimension is 
increased, the bifurcation chain subsets become "increas- 
ingly" dense, or a-dense in the closure of the chain link 
set (V). The notion of a-dense bifurcation chain set in 
the closure of the chain link set as dimension is increased 
that provides us with the convergence to density of non- 
hyperbolic points we need to satisfy our goals. 



1. Qualitative analysis 

The qualitative analysis will focus on pointing out 
what characteristics we are looking for and why we be- 
lieve a— density of Lyapunov exponent zero-crossings (a- 
dense bifurcation chain set in the closure of the chain link 
set) over a particular region of parameter space exists. A 
byproduct of this analysis will be a picture of one of the 
key traits needed to support our conjectures. We will be- 
gin with figures showing the positive Lyapunov spectrum 
for 16 and 64 dimensions. 

Considering the 16-dimensional case, and splitting the 
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s parameter variation into two regions, region one - 
Ri = [0,0.5], and region two - Rn = [0.5,10]. We then 
partition up Ru using the bifurcation Hnk sets, and col- 
lect the zero crossings in the bifurcation chain sets. 

We want the elements of the bifurcation chain sets to 
be spaced evenly enough so that, as the dimension goes 
to infinity, variations in the s parameter on the chain 
link set will lead to a Lyapunov exponent zero-crossing 
(and a transition from Vi to T^i±i)[63. Considering re- 
gion //[Z^, we wish for the distance along the s axis 
between Lyapunov exponent zero-crossings (elements of 
the bifurcation chain subset) to decrease as the dimen- 
sion is increased. If, as the dimension is increased, the 
Lyapunov exponents begin to "bunch-up" and cease to 
be at least somewhat uniformly distributed, the rest of 
our arguments will surely fail. For instance, in region 
two of the bottom plot of Fig. (|12|l . if the Lyapunov ex- 
ponents were "clumped," there will be many holes where 
variation of s will not imply an exponent crossing. Luck- 
ily, considering the 64-dimensional case as given in Fig. 
(|12ll , our desires seem to be as the spacing between expo- 
nent zero-crossings is clearly decreasing as the dimension 
is increased (consider the region [0.5,4]), and there are 
no point accumulations of exponents. It is also reassur- 
ing to note that even at 16 dimensions, and especially 
at 64 dimensions, the Lyapunov exponents are quite dis- 
tinct and look num— continuous as previously asserted. 
The above figures are, of course, only a picture of two 
networks; if we wish for a more conclusive statement, we 
will need arguments of a statistical nature. 

2. Quantitative and numerical analysis 

Our analysis that specifically targets the a— density of 
Lyapunov exponent zero crossings focuses on an analysis 
of plots of the number of positive exponents versus the s 
parameter. 

Qualitatively, the two examples given in Fig. I|13|) 
(both of which typify the behavior for their respec- 
tive number of neurons and dimensions) exemplify the 
a— density for which we are searching. As the dimen- 
sion is increased, the plot of the variation in the number 
of positive exponents versus s becomes more smooth 71], 
while the width of the peak becomes more narrow. Thus, 
the slope of the number of positive exponents versus s 
between s = (s» is s where there exists the maxi- 
mum number of positive Lyapunov exponents), and s = 2 
drops from —3 at d — 32 to —13 at d = 128. Noting that 
the more negative the slope, the less varition in s is re- 
quired to force a zero-crossing, it is clear that this implies 
a— density of zero-crossings. We will not take that line 
of analysis further, but rather will give brute force evi- 
dence for a— density by directly noting the mean distance 
between exponent zero-crossings. 



From Fig. H14|l . it is clear that as the dimension of 
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FIG. 13: Number of positive LE's for typical individual net- 
works with 32 neurons and 32 (top) and 128 (bottom) dimen- 
sions. 
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FIG. 14: Mean distance between the first 10 zero crossings of 
LE's for many networks with 32 neurons and 16, 32, 64, and 
128 dimensions. 



the network is increased, the mean distance between 
successive exponent zero-crossings decreases. Note that 
measuring the mean distance between successive zero- 
crossings both in an intuitive and brute force manner, 
verifies the sufficient condition for the a— density of the 
set of s values for which there exist zero-crossings of ex- 
ponents. The error bars represent the standard devia- 
tion of the length between zero-crossing over an ensem- 
ble (several hundred for low dimensions, on the order of 
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a hundred for d = 128) networks. For the cases where 
the dimension was 16 and 32, the s increment resolution 
was 6s — 0.01. The error in the zero crossing distance 
for these cases is, at the smahest, 0.02, while at its small- 
est the zero crossing distance is 0.49, thus resolution of 
0.01 in s variation is sufficient to adequately resolve the 
zero crossings. Such is not the case for 64 and 128 di- 
mensional networks. For these cases we were required 
to increase the s resolution to 0.005. The zero crossings 
of a few hundred networks considered were all examined 
by hand; the distances between the zero crossings were 
always distinct, with a resolution well below that nec- 
essary to determine the zero crossing point. The errors 
were also determined by hand, noting the greatest, and 
least reasonable point for the zero crossing. All the zero 
crossings were determined after the smallest positive ex- 
ponent that became positive hit its peak value, i.e. after 
approximately 0.75 in the d = 16 case of Fig. (|12|l . 
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FIG. 15: Mean maximum number of positive LE's versus di- 
mension, all networks have 32 neurons (slope is approximately 



3. Relevance 

The a— density of zero crossings of Lyapunov ex- 
ponents provides the most important element in our 
arguments of conjectures and (O; combining 

nwm— continuity with a— density will essentially net our 
desired results. If continuity of Lyapunov exponents in- 
creases, and the density of zero crossings of exponents 
increases over a set U G of parameter space, it seems 
clear that we will have both hyperbolicity violation and, 
upon variation of parameters in U, we will have the topo- 
logical change we are claiming. Of course small issues re- 
main, but those will be dealt with in the final arguments. 



C. Arbitrarily large number of positive exponents 

For our a— density arguments to work, we need a set 
whose cardinality is asymptotically a countably infinite 
set (such that it can be a— dense in itself) and we need 
the distance between the elements in the set to approach 
zero. The later characteristic was the subject of the pre- 
vious section, the former subject is what we intend to 
address in this section. 



2. Quantitative analysis 

We will use a brute force argument to demonstrate the 
increase in positive Lyapunov exponents with dimension; 
we will simply plot the number of positive exponents at 
the maximum number of exponents as dimension is in- 
creased. We claim that the number of Lyapunov expo- 
nents increases and, in fact, diverges to infinity as the 
limit dimension of the network is taken to infinity. Figure 
(I15|l showing the number of positive Lyapunov exponents 
versus dimension. 

From Fig. H15|l it is clear that as the dimension is in- 
creased, the number of p ositive exponents increases in a 
nearly linear fashion |72 |. Further, this plot is linear to as 
high a dimension as the authors could compute enough 
cases for reasonable statistics. If the maximum number 
of exponents versus dimension remains linear beyond the 
range we could compute, we will have the countably in- 
finite number of positive exponents we require. 



3. Relevance 



1. Qualitative analysis 

The qualitative analysis of this can be seen in Fig. 
(|13|l : as the dimension is increased, the maximum num- 
ber of positive Lyapunov exponents clearly increases. We 
wish to quantify that the increase in the number of posi- 
tive exponents versus dimension occurs for a statistically 
relevant set of networks. 



The importance of the increasing number of posi- 
tive exponents with dimension is quite simple. For the 
a— density of exponent zero crossing to be meaningful in 
the infinite-dimensional limit, there must also be an arbi- 
trarily large number of positive exponents that can cross 
zero. If, asymptotically, there is a finite number of posi- 
tive exponents, all of our claims will be false; a— density 
requires a countably infinity set. 
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VI. NUMERICAL ARGUMENTS FOR 
CONJECTURES 

A. Decreasing window probability 

With the num— continuity and a— density arguments 
already in place, all the evidence required to show the 
length of periodic windows along a curve in parameter 
space is already in place. We will present a bit of new 
data, but primarily we will clarify exactly what the con- 
jecture says. We will also list the specifics under which 
the conjecture applies in our circumstances. 

1. Qualitative analysis 

Qualitative evidence for the dissappearance of peri- 
odic windows amidst chaos is evident from Figs. JTJ, 
^ and (|12() : the periodic windows that dominate the 4- 
dimcnsional network over the parameter range s = to 
10 are totally absent in the 64-dimensional network. It 
is important to note that for this conjecture, as well as 
all our conjectures, we are considering the s parameter 
over ranges no larger than to 10. We will avoid, for the 
most part, the "route to chaos" region (s near zero), as it 
yields many complex issues that will be saved for another 
report. We will instead consider the parameter region 
after the lowest positive exponent first becomes positive. 
We could consider parameter ranges considerably larger, 
but for s very large, the round-off error begins to play a 
significant role, and the networks become binary. This 
region has been briefly explored in further analysis 
is necessary for a more complete understanding |55l |. 

2. Quantitative and numerical analysis 

The quantitative analysis we wish to perform will in- 
volve arguments of two types; those that are derived from 
data given in sections IjV A() and IjV lj|l . and those that 
follow from statistical data regarding the probability of 
a window existing for a given s along an interval in R. 
We begin by recalling what we are attempting to claim 
and what conditions we need to verify the claim. We will 
then present the former argument and conclude with the 
latter. 

The conjecture we are investigating claims that as 
the dimension of a dynamical system is increased, pe- 
riodic windows along a one-dimensional curve in param- 
eter space vanish in a significant portion of parameter 
space for which the dynamical system is chaotic. This 
is, of course, dependent upon the region of parameter 
space one is observing — and there is likely no way to 
rid ourselves of such an issue. For our purposes, we will 
generally be investigating the region of s parameter space 
between 0.1 and 10, however, sometimes we will limit the 
investigation to s between 2 and 4. Little changes if we 
increase s until the network begins behaving as a binary 



system due (quite possibly) to the round-off error. How- 
ever, along the transition to the binary region, there are 
significant complications which we will not address here. 
As the dimension is increased, the main concern is that 
the lengths of the bifurcation chain sets must increase 
such that there will exist at least one bifurcation chain 
set that has a cardinality approaching infinity as the di- 
mension of the network approaches infinity. 

Our first argument is based directly upon the evidence 
of nwTO— continuity of Lyapunov exponents. From Fig. 
(|10|l it is clear that as the dimension of the set of net- 
works sampled is increased, the mean difference in Lya- 
punov exponents over small {6s = 0.01) s parameter per- 
turbation decreases. This increase in num— continuity 
of the Lyapunov exponents with dimension over our pa- 
rameter range is a direct result of the disappearance of 
periodic windows from the chaotic regions of parameter 
space. This evidence is amplified by the decrease in the 
standard deviation of the nwm— continuity versus dimen- 
sion (of both the mean of the exponents and the largest 
exponent). This decrease in the standard deviation of 
the num— continuity of the largest Lyapunov exponent 
allows for the existence of fewer large deviations in Lya- 
punov exponents (large deviations are needed for all the 
exponents to suddenly become less than or equal to zero). 

We can take this analysis a step further and simply cal- 
culate the probability of an s value having a periodic orbit 
over a given interval. Figure H16|l shows the probability 
of a periodic window existing for a given s on the interval 
(2,4) with 5s — 0.001 for various dimensions. There is a 
power law in the probability of periodic windows — the 
probability of the existence of a period window decreases 
approximately as ^. Moreover, the authors have ob- 
served that in high dimensional dynamical systems, when 
periodic windows are observed on the interval (2, 4), they 
are usually large in length. In other words, even though 
the probability that a given s value will yield a periodic 
orbit for d = 64 is 0.02, it is likely that the probability 
is contained in a single connected window, as opposed 
to the lower dimensional scenario where the probability 
of window occurrence is distributed over many windows. 
We will save further analysis of this conjecture for a dif- 
ferent report ('56^), but hints to why this phenomena is 
occuring can be found in ,52J . 



3. Relevance 

Decreasing window probability inside the chaotic re- 
gion provides direct evidence for conjectures Q and (O 
along a one-dimensional interval in parameter space. We 
will, in a more complete manner, attack those conjectures 
in a different report. We will use the decreasing periodic 
window probability to help verify conjecture since it 
provides the context we desire with the num— continuity 
of the Lyapunov spectrum. Our argument requires that 
there exists at least one maximum in the number of posi- 
tive Lyapunov exponents with parameter variation. Fur- 
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FIG. 16: logj of the probability of periodic or quasi-periodic 
windows versus logj of dimension. The line — 2.16d~^'^^^ 
is the least squares fit of the plotted data. 



ther, that maximum must increase monotonically with 
the dimension of the system. The existence of periodic 
windows causes the following problems: periodic win- 
dows can still yield structural instability - but in a catas- 
trophic way; periodic windows split up our bifurcation 
chain sets which, despite not being terminal to our ar- 
guments, provide many complications with which we do 
not contend. However, we do observe a decrease in pe- 
riodic windows and with the decrease in the (numerical) 
existence of periodic windows comes the decrease in the 
number of bifurcation chain sets; i.e. I = |&n — ai| is 
increasing yet will remain finite. 



B. Hyperbolocity violation 

We will present two arguments for hyperbolicity vio- 
lation - or nearness to hyperbolicity violation of a map 
at a particular parameter value, s. The first argument 
will consider the fraction of Lyapunov exponents near 
zero over an ensemble of networks versus variation in the 
s parameter. If there is any hope of the existence of 
a chain link set with bifurcation link sets of decreasing 
length, our networks (on the s interval in question) must 
always have a Lyapunov exponent near zero. The second 
argument will come implicitly from a— density arguments 
presented in section HVB|I . To argue for this conjecture, 
we only need the existence of a neutral directionjT^, or, 
more accurately, at least two bifurcation link sets, which 
is not beyond reach. 



1. Qualitative analysis 

A qualitative analysis of hyperbolocity violation comes 
from combining the nwm— continuity of the exponents in 
Fig. ^ and the evidence of exponent zero crossings from 
Figs. ((T^ and ((1111) . If the exponents are continuous with 
respect to parameter variation (at least locally) and they 
start negative, become positive, and eventually become 
negative, then they must be zero (within numerical pre- 
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FIG. 17: Mean fraction of LE's near zero (0 ± 0.01) for net- 
works with 32 neurons and 32 or 64 dimensions (averaged over 
100 networks). 



cision) for at least two points in the parameter space. 
It happens that the bifurcation chain link sets are LCE 
decreasing from i to i + 1, which will provide additional, 
helpful, structure. 



2. 



Quantitative and numerical ana 



The first argument, which is more of a necessary but 
not sufficient condition for the existence of hyperbolicity 
violation, consists of searching for the existence of Lya- 
punov exponents that are zero within allowed numerical 
errors. With nwrn— continuity, this establishes the exis- 
tence of exponents that are numerically zero. For an in- 
tuitive feel for what numerically zero means, consider the 
oscillations in Fig. H13|l of the number of positive expo- 
nents versus parameter variation. It is clear that as they 
cross zero there are numerical errors that cause an appar- 
ent oscillation in the exponent; these oscillations are due 
largely to numerical fluctuations in the calculations [t^. 
There is a certain fuzziness in numerical results that is 
impossible to remove, thus questions regarding exponents 
being exactly zero are ill-formed. Numerical results of the 
type presented in this paper need to be viewed in a frame- 
work similar to physical experimental results. With this 
in mind, we need to note the significance of the exponents 
near zero. To do this, we calculate the relative number of 
Lyapunov exponents numerically at zero compared to the 
ones away from zero. All this information can be summa- 
rized in Fig. (|17|1 which addresses the mean fraction of 
exponents that are near zero versus parameter variation. 

The cut-off for an exponent being near zero is ±0.01, 
which is approximately the expected numerical error in 
the exponents for the number of iterations we are using. 
There are four important features to notice about Fig. 
(|17|l : there are no sharp discontinuities in the curves; 
there exists an interval in parameter space such that there 
is always at least one Lyapunov exponent in the interval 
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(—0.01, 0.01) and the length of that parameter interval is 
increasing with dimension; the curves are concave — im- 
plying that exponents are somehow leaving the interval 
(—0.01, 0.01); and there is a higher fraction of exponents 
near zero at the same s value for higher dimension. The 
first property is important because holes in the parame- 
ter space where there are no exponents near zero would 
imply the absence of the continuous zero crossings we 
will need to satisfy conjecture 10). To satisfy conjecture 
(|T|) we only need three exponents to be near zero and un- 
dergo a zero crossing for the minimal bifurcation chain 
subset [t^ to exist. There are clearly enough exponents 
on average for such to exist for at least some interval in 
parameter space at d = 32, e.g. for (0.1,0.5). For d = 64 
that interval is much longer — (0.1,1). Finally, if we 
want the chain link set to be more connected and for the 
distance between elements of the bifurcation chain subset 
to decrease, we will need the fraction of exponents near 
zero for the fixed interval (—0.01,0.01) for a given inter- 
val in s to increase with dimension. This figure does not 
imply that there will exist zero-crossings, but it provides 
the necessary circumstance for our arguments. 

The second argument falls out of the a— density and 
nwrn- continuity arguments. We know that as the dimen- 
sion is increased, the variation of Lyapunov exponents 
versus parameter variation decreases until, at dimension 
64, the exponent variation varies continuously within nu- 
merical errors (and thus upon moving through zero, the 
exponent moves through zero continuously). We also 
know that on the interval in parameter space A = [0.1, 6], 
the distance between exponent zero crossings decreases 
monotonically. Further, on this subset A, there always 
exists a positive Lyapunov exponent, thus implying the 
existence of bifurcation chain set whose length is at least 
5.9. Extrapolating these results to their limits in infinite 
dimensions, the number of exponent crossings on the in- 
terval A will monotonically increase with dimension. As 
can be seen from Fig. 114|l . the exponent zero-crossings 
are relatively uniform with the distance between cross- 
ings decreasing with increasing dimension. Considering 
Fig. H12|l , the exponent zero crossings are also transverse 
to the s axis. Thus the zero crossings on the interval A, 
which are exactly the points of non-hyperbolocity we are 
searching for, are becoming dense. This is overkill for 
the verification of the existence of a minimal bifurcation 
chain set. This is strong evidence for both conjectures 
and |(21) . It is worth noting that hitting these points 
of hyperbolocity violation upon parameter variation is 
extremely unlikely under any uniform measure on R as 
they are a countable collection of points. [7^ Luckily, this 
does not matter for either the conjecture at hand or for 
any of our other arguments. 



3. Relevance 

The above argument provides direct numerical evi- 
dence of hyperbolocity violation over a range of the pa- 



rameter space. This is strong evidence supporting con- 
jecture It does not yet verify conjecture Q, but it 
sets the stage as we have shown that there is a significant 
range over which hyperbolocity is violated. The former 
statement speaks to conjecture (@J also; a full explana- 
tion of conjecture l@J requires further analysis, which is 
the subject of a discussion in the final remarks. 

C. Hyperbolocity violation versus parameter 
variation 

We are finally in a position to consider the final argu- 
ments for conjecture ||2Jl. To complete this analysis, we 
will need the following pieces of information: 

i. we need the maximum number of positive expo- 
nents to go to infinity 

ii. we need a region of parameter space for which 
a— density of Lyapunov exponent zero crossings ex- 
ists; i.e. we need an arbitrarily large number of 
adjoining bifurcation link sets (such that the car- 
dinality of the bifurcation chain set becomes arbi- 
trarily high) such that for each Vi, the length of Vi, 
I = \bi — ai\, approaches zero. 

iii. we need rtitm- continuity of exponents to increase 
as the dimension increases 

iv. a major simplification can be provided with the ex- 
istence of one global maximum in the number of 
positive exponents and entropy, and along any por- 
tion of parameter space where s is greater than the 
s at the maximum number of positive exponents, 
the maximum and minimum number of exponents 
occur on the graph at the end points of the param- 
eter range (within numerical accuracy) 

The a— density, riwm— continuity and the arbitrary 
numbers of positive exponent arguments we need have, 
for the most part, been provided in previous sections. 
In this section we will simply apply the a— density and 
nwrn— continuity results in a manner that suits our needs. 
The evidence for the existence of a single maximum in 
the number of positive exponents, a mere convenience for 
our presentation, is evident from section (|V C|l . We will 
simply rely on all our previous figures and the empirical 
observation that as the dimension is increased above d ~ 
32, for networks that have the typical num— continuity 
(which includes all networks observed for d > 64), there 
exists a single, global maximum in the number of positive 
exponents versus parameter variation. 

1. Qualitative analysis 

The qualitative picture we are using for intution is that 
of Fig. (|12|) . This figure displays all the information we 
wish to quantify for many networks; as the dimension 
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is increased, there is a region of parameter space where 
the parameter variation needed to achieve a topologically 
different (by topologically different, we mean a different 
number of global stable and unstable manifolds) attrac- 
tor decreases to zero. Based on Fig. (|12|) (and hundreds 
of similar plots), we claim that qualitatively this param- 
eter range exists for at least 0.5 < s < 6. 



2. Quantitative and numerical analysis 

Let us now complete our arguments for conjecture 1^)1. 
For this we need a subset of the parameter space, B C R^, 
such that some variation oi s € B will lead to a topolog- 
ical change in the map / in the form of a change in the 
number of global stable and unstable manifolds. Specifi- 
cally, we need B = LiVi = V, where Vi and Vi+i share a 
limit point and are disjoint. Further, we need the vari- 
ation in s needed for the topological change to decrease 
monotonically with dimension on V. More precisely, on 
the bifurcation chain set, U, the distance between ele- 
ments must decrease monotonically with increasing di- 
mension. We will argue in three steps: first, we will 
argue that, for each / with a sufficiently high number of 
dimensions, there will exist an arbitrarily large number 
of exponent zero crossings (equivalent to an arbitrarily 
large number of positive exponents); next we will argue 
that the zero crossings are relatively smooth; and finally, 
we will argue that the zero crossings form an a-dense set 
on y — or on the bifurcation chain set, ^ = |6i — a^j — > 
as d ^ oo. This provides strong evidence supporting 
conjecture |2Jl. 

Assume a sufficiently large number of dimensions, ver- 
ification of conjecture Q gives us the existence of the 
bifurcation chain set and the existence of the adjoining 
bifurcation link sets. The existance of an arbitary num- 
ber of positive Lyapunov, and thus an arbitrarily large 
number of zero crossings follows from section IjV C|l . That 
the bifurcation chain set has an arbitrarily large number 
of elements, #?7 — > oo is established by conjecture 
because, without periodic windows, every bifurcation link 
set will share a limit point with another bifurcation link 
set. From section (jV A|) . the num— continuity of the ex- 
ponents persists for a sufficiently large number of dimen- 
sions, thus the Lyapunov exponents will cross through 
zero. Finally, section (|V tells us that the Lyapunov 
exponent zero crossings are a— dense, thus, for all Ci G U, 
\ci — Ci+i| — > 0, where Ci and q+i are sequential elements 
of [/. 

Specifically for our work, we can identify U such that 
U C [0.5,6]. We could easily extend the upper bound 
to much greater than 6 for large dimensions {d > 128). 
How high the upper bound can be extended will be a 
discussion in further work. 

Finally, it is useful to note that the bifurcation link sets 
are LCE decreasing with increasing s. This is not neces- 
sary to our arguments, but it is a nice added structure 
that aids our intuition. The LCE decreasing property 



exists due to the existence of the single, global maximum 
in the maximum number of positive Lyapunov exponents 
followed by an apparent exponential fall off in the number 
of positive Lyapunov exponents. 

3. Relevance 

The above arguments provide direct evidence of conjec- 
tures and for a one-dimensional curve (specifically 
an interval) in parameter space for our networks. This 
evidence also gives a hint with respect to the robustness 
of chaos in high-dimensional networks with perturbations 
on higher-dimensional surfaces in parameter space. Fi- 
nally, despite the seemingly inevitable topological change 
upon minor parameter variation, the topological change 
is quite benign. 

VII. FITTING EVERYTHING TOGETHER 

Having finished with our specific analysis, it is now 
time to put our work in the context of other work, both 
of a more mathematical and a more practical and exper- 
imental nature. In this spirit, we will provide, first, a 
brief summary of our arguments followed by a discussion 
of how our results fit together with various theoretical 
results from dynamical systems and turbulence. 

A. Summary of arguments 

We will give brief summaries of our results, both in the 
interest of clarity and to relate our results and methods 
to others. 

1. Periodic window probability decreasing: conjecture\^ 

The conjecture that the probability of periodic win- 
dow existence for a given s value along an interval in pa- 
rameter space decreases with increasing dimension upon 
the smallest positive Lyapunov exponent becoming pos- 
itive, is initially clear from considering the Lyapunov 
spectra of neural networks versus parameter variation 
for networks of increasing size (Figs. iQ and ||SJ)). We 
show that as the dimension is increased, the observed 
probability of periodic windows decreases inversely with 
increase in dimension. The motivation for arguing in 
this way is simple; this analysis is independent from the 
nwTO— continuity analysis, and the results from the anal- 
ysis of nitm— continuity and periodic window probability 
decrease reinforce each other. The mechanism that this 
conjecture provides us with is the lengthening of the bi- 
furcation chain set. 

Further investigations of this particular phenomena 
will follow in a later report. For other related results 
see [12, [13, m, and 
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S. Hyperbolicity violation: conjecture^ 

The intuition for this conjecture arises from observ- 
ing that for our high-dimensional systems, there exists 
at least one Lyapunov exponent that starts negative, be- 
comes positive, then goes negative again; thus if it be- 
haves numerically continuously, it must pass through zero 
for some parameter value s. 

To verify this conjecture, we presented two different 
arguments. This first argument was a necessary but not 
sufficient condition for hyperbolicity violations. We show 
that over a sizeable interval in parameter space, there ex- 
ists a Lyapunov exponent very near zero, and the fraction 
of the total number of Lyapunov exponents that are near 
zero increases over a larger interval of parameter space 
as the dimension is increased. The second argument was 
based on the a— density of exponent zero crossings, the 
nwrn— continuity of the exponents as the dimension in- 
creased, and the increasing number of positive exponents 
with dimension. Both arguments together help imply an 
interval of parameter space such that on that interval, 
the number of parameter values such that hyperbolicity 
is violated is increasing. 



3. Existence of Codimension-e bifurcation set: conjectured^ 

Conjecture Q is the next step in relating our results 
with the results of structural stability theory. Given 
the results supporting conjecture 1^, conjecture (0) only 
needs a few added bits of evidence for its vindication. 

The intuition for this argument follows from observ- 
ing that the peak in the number of positive Lyapunov 
exponents tends toward a spike of increasing height and 
decreasing width as the dimension is increased. This, 
with some sort of continuity of exponents, argues for a 
decrease in distance between exponent zero crossings. 

A summary for the arguments regarding conjecture 
is as follows. With increasing dimension we have: 
increased num— continuity of Lyapunov exponents; in- 
creasing number of positive Lyapunov exponents; and 
a— density of Lyapunov exponent zero crossings (thus all 
the exponents are not clustered on top of each other). 
Thus, on a finite set in parameter space, we have an 
arbitrary number of exponents that move numerically 
smoothly from negative values, to positive values, and 
back to negative values. Further, these exponents are rel- 
atively evenly spaced. Thus, the set in parameter space 
for which hyperbolicity is violated is increasingly dense; 
and with an arbitrarily number of violations available, 
the perturbation of the parameter required to force a 
topological change (a change in the number of positive 
exponents) becomes small. 



4- Non-genericity of structural stability: conjecture |7| 

As previously mentioned, it could appear that our re- 
sults are contrary to Robbin 2] , Robinson 3] , and Mane 
01 . We will discuss specifically how our results fit with 
theirs in section (|V11 C In the current discussion, we 
wish to properly interpret our results in a numerical con- 
text. 

We claim to have found a subset of parameter space 
that, in the limit of infinite dimensions, has dense hy- 
perbolicity violation. This could be interpreted to imply 
that we have located a set for which strict hyperbolic- 
ity does not imply structural stability, because the 
changes in the parameter give rise to topologically dif- 
ferent behaviors. The key issue to realize is that in nu- 
merical simulations, there do not exist infinitesimals or 
infinite-dimensional limits [t^. Rather, we can speak to 
how behaviors arise, and how limits behave along the 
path to the ideal. We have found a subset of parameter 
space that we believe can approximate (with unlimited 
computing) arbitrarily closely a set for which hyperbol- 
icity will not imply structural stability. Thus, an exper- 
imentalist or a numerical physicist might see behavior 
that looks like it violates the results of Robbin , Robin- 
son , and Mane Q ; yet it will not strictly be violating 
those theorems. The key point of this conjecture is that 
we can observe apparent violation of the structural stabil- 
ity conjecture, but the violation (on a Lebesgue measure 
zero set) occurs as smooth, not catastrophic, topological 
change. (In section (|VII C 1|) we will further discuss our 
results as they relate to those of Robbin 0, Robinson 
3], and Mafie 0.) 



5. Robust chaos: conjecture\^ 

That chaos is a robust behavior for bounded high- 
dimensional dynamical systems is not particularly sur- 
prising, especially in light of Fig. |0J, information pre- 
sented in sections (jV Cll and (|VI A|l . and previous work 
|24| . Beyond casual observation, we will not comment 
because it is the topic of a work in progress 56]. It is 
important to note that we do not observe sinks or pe- 
riodic windows in the chaotic region of parameter space 
for a sufficiently high dimension. This particular char- 
acteristic is, however, somewhat heartening if one is to 
compare our results with many high-dimensional chaotic 
and turbulent natural systems as these systems are con- 
stantly being perturbed, yet their behavior is relatively 
robust. Readers interested in arguments al ong the lines 
of conjecture Q are directed to [53, 113, [Eg, 113, or 
l59l for further information. 
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B. Fitting our results in with the space of 
function: how our network selection method affects 
our view of function space 

Performing a numerical experiment induces a measure 
upon whatever is being experimented upon. We now dis- 
cuss some of the characteristics of our imposed measure 
and how they might affect our results. Recall, often in 
mathematics, it is desirable to prove that various results 
are invariant to the measure imposed upon the space; in 
our case this would be extremely difficult if not impossi- 
ble, thus we will resort to the aforementioned, standard 
experimental style. 

A measure, in a very general sense, provides a method 
of measuring the volume a set occupies in its ambi- 
ent space (for a formal treatment, see Usually 
that method provides a specific mechanism of measuring 
lengths of a covering interval. Then, the entire space is 
covered with the aforementioned intervals, and their col- 
lective volume is summed. One of the key issues is how 
the intervals are weighted. For instance, considering the 
real line with the standard Gaussian measure imposed 
upon it; the interval [—1,1] contains the majority of the 
volume of the entire interval [— cx),oo]. Our method of 
weighting networks selects fully connected networks with 
random Gaussian weights. Thus, in limit of high dimen- 
sion and high number of neurons, very weakly connected 
networks will be rare, as the Gaussian statistics of the 
weights will be dominant. Likewise, fully connected net- 
works where all the weights have the same strength (up 
to an order of magnitude) will also be uncommon. One 
can argue whether our measure realistically represents 
the function space of nature, but those arguments are 
ill-formed because they cannot be answered without ei- 
ther specific information about the natural system with 
which our framework is being compared, or the existence 
of some type of invariant measure. Nevertheless, our 
framework docs cover the entire space of neural networks 
noted in section although all sets do not have equal 
likelihood of being selected, and thus our results must be 
interpreted with this in mind. 

A second key issue regards how the ambient space is 
split into intervals; or in a numerical sense, how the 
grain of the space is constructed. We will again in- 
troduce a simpler case for purposes of illustration, fol- 
lowed by a justification of why the simpler case and 
our network framework are essentially equivalent. Be- 
gin with i?" and select each coordinate (vi) in the vector 
V = {vi, V2, • • ■ , Vn} G R" from a normal, i.i.d. distri- 
bution with mean zero, variance one. Next, suppose that 
we are attempting to see every number and every number 
combination. This will be partially achieved by the ran- 
dom number selection process mentioned above, and it is 
further explored by sweeping the variance, i.e. selecting 
a scalar s G i?, < s, and sweeping s over the positive 
real line, sv. This establishes two meshes, one for the 
individual vectors which is controlled by how finely the s 
parameter is varied, and another mesh that controls how 



the initial coordinates are selected. These two combined 
meshes determine the set of combinations of coordinates 
that will be observed. If one considers how this affects 
vector selection in, say, for simplicity, both in the ini- 
tial vector selection and in the vector sweeping, it is clear 
how will be carved out. 

The point of the above paragraph is simply this: we are 
associating how we carve up our neural network function 
space with how we carve up the neural network weight 
space. It should be clear that this is comparing apples to 
apples. In the above paragraph, to understand how our 
neural network selection process works, simply associate 
V with the vectors in the w matrix and the scaling param- 
eter s with s. This keeps the view of our function space 
largely in standard Euclidean space. Of course there is 
the last remaining issue of the amplitude terms, the /3's. 
Apply the same type of analysis to the /?'s as we did for 
the w's in the above paragraph. Of course initially it 
would seem that the scaling parameter is missing, but 
note that multiplying the /3's by s, in our networks, is 
essentially equivalent to multiplying the w's by s. To 
understand this, consider the one-dimensional network, 
with one neuron: 

Xt+i = Po+f3itanh{swo+swi{(3Q+Pita'nh{swQ+swiXt-i))) 

(39) 

It is clear from this that inserting s inside tanh will sweep 
the /3's, but inserting s outside the squashing function 
will miss sweeping the wq bias term. j7q 

From this is should be clear that our framework will 
capture the entire space of neural networks we are em- 
ploying. Yet, it should also be clear that we will not 
select each network with equal probability. Weakly con- 
nected networks will not be particularly common in our 
study, especially as the number of dimensions and neu- 
rons increase, because the statistics of our weights will 
more closely resemble their theoretical distributions. It 
is also worth noting that a full connection between net- 
work structure and dynamics, in a sensible way, is yet 
out of reach (as opposed to, say, for spherical harmon- 
ics). Nevertheless, we claim that our framework gives 
a complete picture of the space of C' maps of compact 
sets to compact sets with the Sobolev metric from the 
perspective of a particular network selection method. 

C. Our results related to other results in 
dynamical systems 

As promised throughout, we will now connect our re- 
sults with various theorems and conjectures in the field 
of dynamical systems. This will hopefully help put our 
work in context and increase it's understandability. We 
will address how our work fits in with the stability con- 
jecture of Smale and Palis 1]. First we will discuss our 
results and the structural stability theories of Robbin 
and Mafie 0| which state that structurally stable systems 
must be hyperbolic. We will follow this by relating our 
studies to the work in partial hyperbolicity and stable 
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ergodicity - the reaction to difficulties in showing that 
hyperboHc systems are structurally stable. We will con- 
clude this portion of the summary by discussing how our 
work relates to one of the conjectures from a paper by 
Palis 



1. Structural stability theory and conjecture^ 

It is now time to address the apparent conflict between 
our observations and the structural stability theorems of 
Robbin 0, Robinson 0], and Mafie 01 • We would like 
to begin by noting that we do not doubt the validity 
or correctness of any of the aforementioned results. In 
fact, any attempt to use our techniques and results to 
provide a counter example to the theorems of Robbin, 
Robinson, or Mafie involves a misunderstanding of what 
our methods are able to do and indeed intend to imply. 

In conjecture |0J we claim, in an intuitive sense, that 
along a one-dimensional curve in parameter space, our 
dynamical systems are hyperbolic with measure one, with 
respect to Lebesgue measure. Yet, we can still find sub- 
sets that are measure zero, yet a— dense, for which our 
dynamical systems are partially hyperbolic rather than 
hyperbolic. The motivation for the above statement 
roughly derives from thinking of a turbulent fluid. In 
this circumstance, the number of unstable manifolds can 
be countably infinite, and upon varying, say, the viscos- 
ity, from very low to very high, one would have a count- 
able number of exponents becoming positive over a finite 
length of parameter space. Yet, all the limits of this sort 
and all the intuitive ideas with respect to what will hap- 
pen in the infinite-dimensional limit, are just that, ideas. 
There are limits to what we can compute; there do not 
exist infinite-dimensional limits in numerical computing; 
there do not exist infinitesimals in numerical computing; 
and aside from the existence of convergence theorems, we 
are left unable to draw conclusions beyond what our data 
says. Thus, our results do not provide any sort of counter 
example to the stability conjecture. Rather, a key point 
of our results is that we do observe, in a realistic numer- 
ical setting, structural instability upon small parameter 
variation. It is useful to think instead of structural stabil- 
ity as an open condition on our parameter space whose 
endpoints correspond to the points of structural insta- 
bility - the points of bifurcations in turbulence. These 
disjoint open sets are precisely the bifurcation link sub- 
sets, Vi for which the map / is structurally stable. As the 
dimension is increased, the length of the Vi^s decreases 
dramatically, and may fall below numerical or experimen- 
tal resolution. Thus, the numerical or experimental sci- 
entist might observe, upon parameter variation, systems 
that should according to the work of Robbin, Robinson 
and Mafie, be structurally stable, to undergo topologi- 
cal variation in the form of a variation in the number 
of positive Lyapunov exponents; i.e. the scientist might 
observe structural instability. This is the very practical 
difference between numerical computing and the world of 



strict mathematics. (Recall we were going to attempt to 
connect structural stability theory closer to reality, the 
former statement is as far as we will go in this report.) 
The good news is that even though observed structural 
stability might be lost, it is lost in a very meek manner 
- the topological changes are very slight, just as seems 
to be observed in many turbulent experimental systems. 
Further, partial hyperbolicity is not lost, and the dynam- 
ically stable characteristics of stable ergodicity seem to 
be preserved, although we obviously can't make a strict 
mathematical statement. 

Thus, rather than claiming our results are contrary 
to those of Robbin p|, Robinson 0], and Mane 0, we 
note that our results speak both to what might be seen 
of those theorems in high-dimensional dynamical systems 
and how their results are approached upon increasing the 
dimension of a dynamical system. 

It is worth noting that, given a typical 64-dimensional 
network, if we fixed s at such a point that there was an 
exponent zero crossing, we believe (based upon prelimi- 
nary results) that there will exist many perturbations of 
other parameters that leave the exponent zero crossing 
unaffected. However, it is believed at this time that these 
perturbations are of very small measure (with respect to 
Lebesque measure), and of a small codimension set, in pa- 
rameter space, i.e. we believe we can find perturbations 
that will leave the seemingly transversal intersection of 
an exponent with at a particular s value unchanged, 
yet these parameter changes must be small. 

2. Partial hyperbolicity 

In this study we are particularly concerned with the in- 
terplay, along a parameterized curve, of how often partial 
hyperbolicity is encountered versus strict hyperbolicity. 
It should be noted that if a dynamical system is hyper- 
bolic, it is partially hyperbolic. All of the neural net- 
works we considered were at least partially hyperbolic; 
we found no exceptions. Many of the important ques- 
tions regarding partially hyperbolic dynamical systems 
lies in showing the conditions under which such systems 
are stably ergodic. We will now discuss this in relation 
to our results and methods. 

Pugh and Shub put forth the following conjecture 
regarding partial hyperbolicity and stable ergodicity: 

Conjecture 6 (Pugh and Shub [28] Conjecture 3) 

Let f G Diff^{M) where M is compact. If f is partially 
hyperbolic and essentially accessible, then f is ergodic. 

In that same paper they also proved the strongest result 
that had been shown to date regarding their conjecture: 

Theorem 3 (Pugh-Shub theorem (theorem A 
[28])) If f E Diff^{M) is a center bunched, partially 
hyperbolic, dynamically coherent diffeomorphism with the 
essential accessibility property, then f is ergodic. 
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A difFeomorphism is partially hyperbolic if it satisfies 
the conditions of definition {T)). Ergodic behavior im- 
plies that, upon breaking the attractor into measurable 
sets, Ai, for / applied to each measurable set for enough 
time, f"{Ai) will intersect every other measurable set 
Aj. This implies a weak sense of recurrence; for in- 
stance, quasi-periodic orbits, chaotic orbits, and some 
random processes are at least colloquially ergodic. More 
formally, a dynamical system is ergodic if and only if 
almost every point of each set visits every set with pos- 
itive measure. The accessibility property simply formal- 
izes a notion of one point being able to reach another 
point. Given a partially hyperbolic dynamical system, 
f : X ^ X such that there is a splitting on the tangent 
bundle TM ^ E'^ ® E" ® E'' , and x, ?/ S X, y is accessible 
from X if there is a path from a; to y whose tangent 
vector lies in n and vanishes finitely many times. 
The diffeomorphism / is center bunched if the spectra 
of T f (as defined in section Bl) ) corresponding to the 
stable (TV), unstable (T"/), and (T<=/) central direc- 
tions He in thin, well separated annuli (see 0, page 131 
for more detail, the radii of the annuli is technical and 
is determined by the Holder continuity of the diffeomor- 
phism.) Lastly, let us note that a dynamical system is 
called stably ergodic if, given / G Dijf^^{M) (again M 

compact), there is a neig hborhood, /ere DiffliM) 
such that every g G Y is ergodic with respect to fj.. We 
will refrain from divulging an explanation of dynamical 
coherence; it is a very crucial characteristic for the proof 
of theorem Q, but we will have little to say in its regard. 

An actual numerical verification of ergodicity can be 
somewhat difficult as the modeler would have to watch 
each point and verify that eventually the trajectory re- 
turned very close to every other point on the orbit (i.e. 
it satisfies the Birkoff hypothesis) . Doing this for a few 
points is, of course, possible, but doing it for a high- 
dimensional attractor for any sizable number of points 
can be extremely time consuming. Checking the acces- 
sibility criterion seems to pose similar problems - in fact 
it is hoped that accessibility is the sufficient recurrence 
conditions for ergodic behavior - thus is should be no 
surprise that accessibility would be difficult to check nu- 
merically (it has been shown to be dense joJl ) • In the 
reality of computing, there is a far more practical way of 
checking for ergodic behavior, motivated by a more prac- 
tical problem in numerical computing, transients. For a 
mathematician, ergodic tools can be applied whenever 
the system can be shown to be ergodic. In numerical 
work, proving that the necessary conditions for the use 
of ergodic measures is often intractable. Besides, for nu- 
merical applications, proving long-term behavior is often 
not good enough since the use of an ergodic diagnostic, 
for the relaxation from the transients to the ergodic state 
can, at times, be prohibitively slow, and sometimes diffi- 
cult to detect. There are even times when the numerical 
errors in the calculations effectively reset the transients. 
The practical solution to this is to apply the ergodic mea- 
sures and, along with the time-series data, watch the 



transients disappear. We did this specifically in section 
(|IV|) to justify our use of ergodic measures. If the errors 
in the ergodic measures along with the transients of the 
attractors decrease with time, then we call the system 
ergodic and feel justified in using ergodic measures, such 
as Lyapunov exponents. 

Considering Figs. Q, lO, and lO, it seems clear that 
our networks are ergodic since the ergodic measures con- 
verge. Further, upon considering Figs. (TJl, ©, and 
(|12ll . when a one-dimensional parameter is varied, er- 
godic behavior is preserved. Of course, showing that one 
has explored all the variations inside the neighborhood 
(/ e)F e Diff^^{AI) is impossible: thus claiming that 
we have, in a mathematically rigorous way, observed sta- 
ble ergodicity as the predominant characteristic would be 
premature. Further, we can say little about the accessi- 
bility property. What we can say is that we have never 
observed a dynamical system, within our construction, 
that is not on a compact set, is not partially hyperbolic, 
and is not stably ergodic. Thus, our results provide evi- 
dence that the conjecture of Shub and Pugh is on track. 
For more information with respect to the mathematics 
discussed above, see 0' 113 ■ 

Comparing conjecture © to theorem (j3Jl, the required 
extra hypotheses for the proof of the theorem are dy- 
namical coherence and center bunching of the spectrum 
of Tf. Pugh and Shub, and others have been attempting 
to eliminate these extra hypothesis. Our results speak lit- 
tle to the issue of dynamic coherence, but our results can 
speak to the issue of center bunching. Considering Fig. 
^ at any value of the s parameter, there is no evidence 
of center bunching, or any real bunching of Lyapunov ex- 
ponents at all. In fact, if there were center bunching, our 
a— density of exponent zero crossing argument would be 
in serious trouble. Thus, we claim that we have strong 
evidence for the removal of the center bunching require- 
ment for stable ergodicity. And, since we are claiming 
that our dynamical systems are seemingly ergodic, if cen- 
ter bunching were required of stable ergodicity, we claim 
that stable ergodicity would be too strict of a distinguish- 
ing characteristic for dynamic stability, [t^ 

3. Our results and Palis's conjectures 

Palis stated many stability conjectures based upon 
the last thirty years of developments in dynamical sys- 
tems. We wish to address one of his conjectures: 

Conjecture 7 (Palis conjecture //) In any di- 
mension, the diffeomorphisms exhibiting either a homo- 
clinic tangency or a (finite) cycle of hyperbolic periodic 
orbits with different stable dimensions (heterodimensiop- 
nal cycle) are C" dense in the complement of the closure 
of the hyperbolic ones. 

Let us decompress this for a moment, and then dis- 
cuss how our results fit with it. Begin by defining the 
space of d-dimensional C diffeomorphisms as X . Next, 
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break that space up as follows: A = {x G X\x ex- 
hibits a homoclinic tangency or a finite cycle of hyper- 
bolic periodic orbits with different stable dimensions } 
and B = {x € X\x is hyperbolic }. Thus B is the set of 
hyperbolic, aperiodic diffeomorphisms, and A is the set 
of periodic orbits or partially hyperbolic orbits. The con- 
jecture states that A is dense in the complement of the 
closure of B; thus A can be dense in B. With respect to 
our results, the partially hyperbolic diffeomorphisms (dif- 
feomorphisms with homoclinic tangencies) can be dense 
within the set of hyperbolic diffeomorphisms. Our con- 
jectures claim to find a subset of our one-dimcnsional pa- 
rameter space such that partially hyperbolic diffeomor- 
phisms will, in the limit of high dimensions, be dense. In 
other words, our work not only agrees with Palis's con- 
jecture // (and subsequently his conjecture ///), but our 
work provides evidence confirming Palis's conjectures. Of 
course, we do not claim to provide mathematical proofs, 
but rather strong numerical evidence supporting Palis's 
ideas. 

D. Final remarks 

Finally, let us briefly summarize: 

Statement of Results 2 ((Summary)) Assuming 
our particular conditions and our particular space of 
dynamical systems as per section Q there exists 
a collection of bifurcation link subsets (V ) such that, 
in the limit of countably infinite dimensions, we have 
numerical evidence for the following: 

Conjecture 0' on the above mentioned set V , strict 
hyperbolicity will be violated a — densely. 

Conjecture\^' on the above mentioned set V , the num- 
ber of stable and/ or unstable manifolds will change under 
parameter variation below numerical precision. 

Conjecture\^ on the above mentioned set V , the prob- 
ability of the existence of a periodic window for a give s 
on a specific parameter interval decreases inversely with 
dimension. 

Conjecture ^ on the above mentioned set V , hyper- 
bolic dynamical systems are not structurally stable within 
numerical precision with measure one with respect to 
Lebesque measure in parameter space. 

In a measure-theoretic sense hyperbolic systems occupy 
all the space, but the partially hyperbolic dynamical sys- 



tems (with non-empty center manifolds) can be a— dense 
on V . Intuitively, if there are countable dimensions - thus 
countable Lyapunov exponents, then one of two things 
can happen upon parameter variation: 

i. there would have to be a persistent homoclinic 
tangency- or some other sort of non-transversal in- 
tersection between stable and unstable manifolds 
that was persistent to parameter changes; 

ii. there can be, at most, countable parameter points 
such that there are non-transversal intersections 
between stable and unstable manifolds. 

We also see that for our networks, each exponent in 
the spectrum converges to a unique (within numerical 
resolution) value. This both confirms the usefulness and 
validity of our techniques, and provides strong evidence 
for the prevalence of ergodic behavior. Further, upon 
parameter variation, the ergodic behavior is seemingly 
preserved; thus we also have strong evidence of a preva- 
lence of stable ergodic behavior. 
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